Anti-viral vectors

ABSTRACT

A viral vector production system is provided which system comprises: (i) a viral genome comprising at least one first nucleotide sequence encoding a gene product capable of binding to and effecting the cleavage, directly or indirectly, of a second nucleotide sequence, or transcription product thereof, encoding a viral polypeptide required for the assembly of viral particles, (ii) a third nucleotide sequence encoding said viral polypeptide required for the assembly of the viral genome into viral particles, which third nucleotide sequence has a different nucleotide sequence to the second nucleotide sequence such that said third nucleotide sequence, or transcription product thereof, is resistant to cleavage directed by said gene product; wherein at least one of the gene products is an external guide sequence capable of binding to and effecting the cleavage by RNase P of the second nucleotide sequence. The viral vector production system may be used to produce viral particles for use in treating or preventing viral infection.

FIELD OF THE INVENTION

The present invention relates to novel viral vectors capable ofdelivering anti-viral inhibitory RNA molecules to target cells.

BACKGROUND TO THE INVENTION

The application of gene therapy to the treatment of AIDS and HIVinfection has been discussed widely (Lever, 1995). The types oftherapeutic gene proposed usually fall into one of two broad categories.In the first the gene encodes protein products that inhibit the virus ina number of possible ways. One example of such a protein is the RevM10derivative of the HIV Rev protein. The RevM10 protein acts as atransdominant negative mutant and so competitively inhibits Rev functionin the virus. Like many of the protein-based strategies, the RevM10protein is a derivative of a native HIV protein. While this provides thebasis for the anti-HIV effect, it also has serious disadvantages. Inparticular, this type of strategy demands that in the absence of thevirus there is little or no expression of the gene. Otherwise, healthycells harbouring the gene become a target for the host cytotoxic Tlymphocyte (CTL) system, which recognises the foreign protein. Thesecond broad category of therapeutic gene circumvents these CTLproblems. The therapeutic gene encodes inhibitory RNA molecules; RNA isnot a target for CTL recognition.

There are several types of inhibitory RNA molecules known: anti-senseRNA, ribozyrnes, competitive decoys and external guide sequences (EGSs).

External guide sequences, first identified by Forster and Altman (1990),are RNA sequences that are capable of directing the cellular proteinRNase P to cleave a particular RNA sequence. In vivo, they are found aspart of precursor tRNAs where they function to direct cleavage by thecellular riboprotein RNase P in vivo of the tRNA precursor to formmature tRNA. However, in principle, any RNA can be targeted by acustom-designed EGS RNA for specific cleavage by RNase P in vitro or inviva. For example, Yuan et al. (1992) demonstrate a reduction in thelevels of chloramphenicol activity in cells in tissue culture as aresult of introducing an appropriately designed EGS.

In recent years a number of laboratories have developed retroviralvector systems based on HIV. In the context of anti-HIV gene therapythese vectors have a number of advantages over the more conventionalmurine based vectors such as murine leukaemia virus (MLV) vectors.Firstly, HIV vectors would target precisely those cells that aresusceptible to HIV infection. Secondly, the HIV-based vector wouldtransduce cells such as macrophages that are normally refractory totransduction by murine vectors. Thirdly, the anti-HIV vector genomewould be propagated through the CD4+ cell population by any virus (HIV)that escaped the therapeutic strategy. This is because the vector genomehas the packaging signal that will be recognised by the viral particlepackaging system. These various attributes make HIV-vectors a powerfultool in the field of anti-HIV gene therapy.

A combination of inhibitory RNA molecules and an HIV-based vector wouldbe attractive as a therapeutic strategy. However, until now this has notbeen possible. Vector particle production takes place in producer cellswhich express the packaging components of the particles and package thevector genome. The inhibitory RNA sequences that are designed to destroythe viral RNA would therefore also interrupt the expression of thecomponents of the HIV-based vector system during vector production. Thepresent invention aims to overcome this problem.

SUMMARY OF THE INVENTION

It is therefore an object of the invention to provide a system andmethod for producing viral particles, in particular HIV particles, whichcarry nucleotide constructs encoding inhibitory RNA molecules such asexternal guide sequences, optionally together with other classes ofinhibitory RNA molecules such as ribozyrnes and/or antisense RNAsdirected against a corresponding virus, such as HIV, within a targetcell, that overcomes the above-mentioned problems. The system includesboth a viral genome encoding the inhibitory RNA molcules and nucleotideconstructs encoding the components required for packaging the viralgenome in a producer cell. However, in contrast to the prior art,although the packaging components have substantially the same amino acidsequence as the corresponding components of the target virus, theinhibitory RNA molecules do not affect production of the viral particlesin the producer cells because the nucleotide sequence of the packagingcomponents used in the viral system have been modified to prevent theinhibitory RNA molecules from effecting cleavage or degradation of theRNA transcripts produced from the constructs. Such a viral particle maybe used to treat viral infections, in particular HIV infections.

Accordingly the present invention provides a viral vector systemcomprising:

(i) a first nucleotide sequence encoding an external guide sequencecapable of binding to and effecting the cleavage by RNase P of a secondnucleotide sequence, or transcription product thereof, encoding a viralpolypeptide required for the assembly of viral particles; and

(ii) a third nucleotide sequence encoding said viral polypeptiderequired for the assembly of viral particles, which third nucleotidesequence has a different nucleotide sequence to the second nucleotidesequence such that the third nucleotide sequence, or transcriptionproduct thereof, is resistant to cleavage directed by the external guidesequence.

Preferably, said system further comprises at least one further firstnucleotide sequence encoding a gene product capable of binding to andeffecting the cleavage, directly or indirectly, of a second nucleotidesequence, or transcription product thereof, encoding a viral polypeptiderequired for the assembly of viral particles, wherein the gene productis selected from an external guide sequence, a ribozyme and ananti-sense ribonucleic acid.

In another aspect, the present invention provides a viral vectorproduction system comprising:

(i) a viral genome comprising at least one first nucleotide sequenceencoding a gene product capable of binding to and effecting thecleavage, directly or indirectly, of a second nucleotide sequence, ortranscription product thereof, encoding a viral polypeptide required forthe assembly of viral particles;

(ii) a third nucleotide sequence encoding said viral polypeptiderequired for the assembly of the viral genome into viral particles,which third nucleotide sequence has a different nucleotide sequence tothe second nucleotide sequence such that said third nucleotide sequence,or transcription product thereof, is resistant to cleavage directed bysaid gene product;

wherein at least one of the gene products is an external guide sequencecapable of binding to and effecting the cleavage by RNase P of thesecond nucleotide sequence.

Preferably, in addition to an external guide sequence, at least one geneproduct is selected from a ribozyme and an anti-sense ribonucleic acid,preferably a ribozyme.

Preferably, the viral vector is a retroviral vector, more preferably alentiviral vector, such as an HIV vector. The second nucleotide sequenceand the third nucleotide sequences are typically from the same viralspecies, more preferably from the same viral strain. Generally, theviral genome is also from the same viral species, more preferably fromthe same viral strain.

In the case of retroviral vectors, the polypeptide required for theassembly of viral particles is selected from gag, pol and env proteins.Preferably at least the gag and pol sequences are lentiviral sequences,more preferably HIV sequences. Alternatively, or in addition, the envsequence is a lentiviral sequence, more preferably an HIV sequence.

In a preferred embodiment, the third nucleotide sequence is resistant tocleavage directed by the gene product as a result of one or moreconservative alterations in the nucleotide sequence which removecleavage sites recognised by the at least one gene product and/orbinding sites for the at least one gene product. For example, where thegene product is an EGS, the third nucleotide sequence is adapted toprevent EGS binding and/or to remove the RNase P consensus cleavagesite. Alternatively, where the gene product is a ribozyme, the thirdnucleotide sequence is adapted to be resistant to cleavage by theribozyme.

Preferably the third nucleotide sequence is codon optimised forexpression in host cells. The host cells, which term includes producercells and packaging cells, are typically mammalian cells.

In a particularly preferred embodiment, (i) the viral genome is an HIVgenome comprising nucleotide sequences encoding anti-HIV EGSs andoptionally anti-HIV ribozyme sequences directed against HIV packagingcomponent sequences (such as gag.pol) in a target HIV and (ii) the viralsystem for producing packaged HIV particles further comprises nucleotideconstructs encoding the same packaging components (such as gag.polproteins) as in the target HIV wherein the sequence of the nucleotideconstructs is different from that found in the target HIV so that theanti-HIV EGS and anti-HIV ribozyme sequences cannot effect cleavage ordegradation of the gag.pol transcripts during production of the HIVparticles in producer cells.

The present invention also provides a viral particle comprising a viralvector according to the present invention and one or more polypeptidesencoded by the third nucleotide sequences according to the presentinvention. For example the present invention provides a viral particleproduced using the viral vector production system of the invention.

In another aspect, the present invention provides a method for producinga viral particle which method comprises introducing into a host cell (i)a viral genome vector according to the present invention; (ii) one ormore third nucleotide sequences according to the present invention; and(iii) nucleotide sequences encoding the other essential viral packagingcomponents not encoded by the one or more third nucleotide sequences.

The present invention further provides a viral particle produced usingby the method of the invention.

The present invention also provides a pharmaceutical compositioncomprising a viral particle according to the present invention togetherwith a pharmaceutically acceptable carrier or diluent.

The viral system of the invention or viral particles of the inventionmay be used to treat viral infections, particularly retroviralinfections such as lentiviral infections including HIV infections. Thusthe present invention provides a method of treating a viral infectionwhich method comprises administering to a human or animal patientsuffering from the viral infection an effective amount of a viralsystem, viral particle or pharmaceutical composition of the presentinvention.

The invention relates in particular to HIV-based vectors carryinganti-HIV EGSs. However, the invention can be applied to any other virus,in particular any other lentivirus, for which treatment by gene therapymay be desirable. The invention is illustrated herein for HIV, but thisis not considered to limit the scope of the invention to HIV-basedanti-HIV vectors.

DETAILED DESCRIPTION OF THE INVENTION

The term “viral vector” refers to a nucleotide construct comprising aviral genome capable of being transcribed in a host cell, which genomecomprises sufficient viral genetic information to allow packaging of theviral RNA genome, in the presence of packaging components, into a viralparticle capable of infecting a target cell. Infection of the targetcell includes reverse transcription and integration into the target cellgenome, where appropriate for particular viruses. The viral vector inuse typically carries heterologous coding sequences (nucleotides ofinterest) which are to be delivered by the vector to the target cell,for example a first nucleotide sequence encoding an EGS. A viral vectoris incapable of independent replication to produce infectious viralparticles within the final target cell.

The term “viral vector system” is intended to mean a kit of parts whichcan be used when combined with other necessary components for viralparticle production to produce viral particles in host cells. Forexample, the first nucleotide sequence may typically be present in aplasmid vector construct suitable for cloning the first nucleotidesequence into a viral genome vector construct. When combined in a kitwith a third nucleotide sequence, which will also typically be presentin a separate plasmid vector construct, the resulting combination ofplasmid containing the first nucleotide sequence and plasmid containingthe third nucleotide sequence comprises the essential elements of theinvention. Such a kit may then be used by the skilled person in theproduction of suitable viral vector genome constructs which whentransfected into a host cell together with the plasmid containing thethird nucleotide sequence, and optionally nucleic acid constructsencoding other components required for viral assembly, will lead to theproduction of infectious viral particles.

Alternatively, the third nucleotide sequence may be stably presentwithin a packaging cell line that is included in the kit.

The kit may include the other components needed to produce viralparticles, such as host cells and other plasmids encoding essentialviral polypeptides required for viral assembly. By way of example, thekit may contain (i) a plasmid containing a first nucleotide sequenceencoding an anti-HIV EGS and (ii) a plasmid containing a thirdnucleotide sequence encoding a modified HIV gag.pol construct whichcannot be cleaved by the anti-HIV ribozyme. Optional components wouldthen be (a) an HIV viral genome construct with suitable restrictionenzyme recognition sites for cloning the first nucleotide sequence intothe viral genome; (b) a plasmid encoding a VSV-G env protein.Alternatively, nucleotide sequence encoding viral polypeptides requiredfor assembly of viral particles may be provided in the kit as packagingcell lines comprising the nucleotide sequences, for example a VSV-Gexpressing cell line.

The term “viral vector production system” refers to the viral vectorsystem described above wherein the first nucleotide sequence has alreadybeen inserted into a suitable viral vector genome.

Viral vectors are typically retroviral vectors, in particular lentiviralvectors such as HIV vectors. The retroviral vector of the presentinvention may be derived from or may be derivable from any suitableretrovirus. A large number of different retroviruses have beenidentified. Examples include: murine leukemia virus (MLV), humanimmunodeficiency virus (HIV), simian immunodeficiency virus, humanT-cell leukemia virus (HTLV). equine infectious anaemia virus (EIAV),mouse mammary tumour virus (MMTV), Rous sarcoma virus (RSV), Fujinamisarcoma virus (FuSV), Moloney murine leukemia virus (Mo-MLV), FBR murineosteosarcoma virus (FBR MSV), Moloney murine sarcoma virus (Mo-MSV),Abelson murine leukemia virus (A-MLV), Avian myelocytomatosis virus-29(MC29), and Avian erythroblastosis virus (AEV). A detailed list ofretroviruses may be found in Coffin et al., 1997, “Retroviruses”, ColdSpring Harbour Laboratory Press Eds: J M Coffin, S M Hughes, H E Varmuspp 758-763.

Details on the genomic structure of some retroviruses may be found inthe art. By way of example, details on HIV and Mo-MLV may be found fromthe NCBI Genbank (Genome Accession Nos. AF033819 and AF033811,respectively).

The lentivirus group can be split even further into “primate” and“non-primate”. Examples of primate lentiviruses include humanimmunodeficiency virus (HIV), the causative agent of humanauto-immunodeficiency syndrome (AIDS), and simian immunodeficiency virus(SIV). The non-primate lentiviral group includes the prototype “slowvirus” visna/maedi virus (VMV), as well as the related caprinearthritis-encephalitis virus (CAEV), equine infectious anaemia virus(EIAV) and the more recently described feline immunodeficiency virus(FIV) and bovine immunodeficiency virus (BIV).

The basic structure of a retrovirus genome is a 5′ LTR and a 3′ LTR,between or within which are located a packaging signal to enable thegenome to be packaged, a primer binding site, integration sites toenable integration into a host cell genome and gag, pol and env genesencoding the packaging components—these are polypeptides required forthe assembly of viral particles. More complex retroviruses haveadditional features, such as rev and RRE sequences in HIV, which enablethe efficient export of RNA transcripts of the integrated provirus fromthe nucleus to the cytoplasm of an infected target cell.

In the provirus, these genes are flanked at both ends by regions calledlong terminal repeats (LTRs). The LTRs are responsible for proviralintegration, and transcription. LTRs also serve as enhancer-promotersequences and can control the expression of the viral genes.Encapsidation of the retroviral RNAs occurs by virtue of a psi sequencelocated at the 5′ end of the viral genome.

The LTRs themselves are identical sequences that can be divided intothree elements, which are called U3, R and U5. U3 is derived from thesequence unique to the 3′ end of the RNA. R is derived from a sequencerepeated at both ends of the RNA and U5 is derived from the sequenceunique to the 5′ end of the RNA. The sizes of the three elements canvary considerably among different retroviruses.

In a defective retroviral vector genome gag, pol and env may be absentor not functional. The R regions at both ends of the RNA are repeatedsequences. U5 and U3 represent unique sequences at the 5′ and 3′ ends ofthe RNA genome respectively.

In a typical retroviral vector for use in gene therapy, at least part ofone or more of the gag, pol and env protein coding regions essential forreplication may be removed from the virus. This makes the retroviralvector replication-defective. The removed portions may even be replacedby a nucleotide sequence of interest (NOI), such as a first nucleotidesequence of the invention, to generate a virus capable of integratingits genome into a host genome but wherein the modified viral genome isunable to propagate itself due to a lack of structural proteins. Whenintegrated in the host genome, expression of the NOI occurs—resultingin, for example, a therapeutic and/or a diagnostic effect. Thus; thetransfer of an NOI into a site of interest is typically achieved by:integrating the NOI into the recombinant viral vector; packaging themodified viral vector into a virion coat; and allowing transduction of asite of interest—such as a targeted cell or a targeted cell population.

A minimal retroviral genome for use in the present invention willtherefore comprise (5′) R—U5—one or more first nucleotide sequences—U3-R(3′). However, the plasmid vector used to produce the retroviral genomewithin a host cell/packaging cell will also include transcriptionalregulatory control sequences operably linked to the retroviral genome todirect transcription of the genome in a host cell/packaging cell. Theseregulatory sequences may be the natural sequences associated with thetranscribed retroviral sequence, i.e. the 5′ U3 region, or they may be aheterologous promoter such as another viral promoter, for example theCMV promoter.

Some retroviral genomes require additional sequences for efficient virusproduction. For example, in the case of HIV, rev and RRE sequence arepreferably included. However the requirement for rev and RRE can bereduced or eliminated by codon optimisation.

Once the retroviral vector genome is integrated into the genome of itstarget cell as proviral DNA, the ribozyme sequences need to beexpressed. In a retrovirus, the promoter is located in the 5′ LTR U3region of the provirus. In retroviral vectors, the promoter drivingexpression of a therapeutic gene may be the native retroviral promoterin the 5′ U3 region, or an alternative promoter engineered into thevector. The alternative promoter may physically replace the 5′ U3promoter native to the retrovirus, or it may be incorporated at adifferent place within the vector genome such as between the LTRs.

Thus, the first nucleotide sequence will also be operably linked to atranscriptional regulatory control sequence to allow transcription ofthe first nucleotide sequence to occur in the target cell. The controlsequence will typically be active in mammalian cells. The controlsequence may, for example, be a viral promoter such as the natural viralpromoter or a CMV promoter or it may be a mammalian promoter. It isparticularly preferred to use a promoter that is preferentially activein a particular cell type or tissue type in which the virus to betreated primarily infects. Thus, in one embodiment, a tissue-specificregulatory sequences may be used. The regulatory control sequencesdriving expression of the one or more first nucleotide sequences may beconstitutive or regulated promoters.

Replication-defective retroviral vectors are typically propagated, forexample to prepare suitable titres of the retroviral vector forsubsequent transduction, by using a combination of a packaging or helpercell line and the recombinant vector. That is to say, that the threepackaging proteins can be provided in trans.

A “packaging cell line” contains one or more of the retroviral gag, poland env genes. The packaging cell line produces the proteins requiredfor packaging retroviral DNA but it cannot bring about encapsidation dueto the lack of a psi region. However, when a recombinant vector carryingan NOI and a psi region is introduced into the packaging cell line, thehelper proteins can package the psi-positive recombinant vector toproduce the recombinant virus stock. This virus stock can be used totransduce cells to introduce the NOI into the genome of the targetcells. It is preferred to use a psi packaging signal, called psi plus,that contains additional sequences spanning from upstream of the splicedonor to downstream of the gag start codon (Bender et al., 1987) sincethis has been shown to increase viral titres.

The recombinant virus whose genome lacks all genes required to makeviral proteins can tranduce only once and cannot propagate. These viralvectors which are only capable of a single round of transduction oftarget cells are known as replication defective vectors.

Hence, the NOI is introduced into the host/target cell genome withoutthe generation of potentially harmful retrovirus. A summary of theavailable packaging lines is presented in Coffin et al., 1997 (ibid).

Retroviral packaging cell lines in which the gag, pol and env viralcoding regions are carried on separate expression plasmids that areindependently transfected into a packaging cell line are preferablyused. This strategy, sometimes referred to as the three plasmidtransfection method (Soneoka et al., 1995) reduces the potential forproduction of a replication-competent virus since three recombinantevents are required for wild type viral production. As recombination isgreatly facilitated by homology, reducing or eliminating homologybetween the genomes of the vector and the helper can also be used toreduce the problem of replication-competent helper virus production.

An alternative to stably transfected packaging cell lines is to usetransiently transfected cell lines. Transient transfections mayadvantageously be used to measure levels of vector production whenvectors are being developed. In this regard, transient transfectionavoids the longer time required to generate stable vector-producing celllines and may also be used if the vector or retroviral packagingcomponents are toxic to cells. Components typically used to generateretroviral vectors include a plasmid encoding the gag/pol proteins, aplasmid encoding the env protein and a plasmid containing an NOI. Vectorproduction involves transient transfection of one or more of thesecomponents into cells containing the other required components. If thevector encodes toxic genes or genes that interfere with the replicationof the host cell, such as inhibitors of the cell cycle or genes thatinduce apotosis, it may be difficult to generate stable vector-producingcell lines, but transient transfection can be used to produce the vectorbefore the cells die. Also, cell lines have been developed usingtransient transfection that produce vector titre levels that arecomparable to the levels obtained from stable vector-producing celllines (Pear et al., 1993).

Producer cells/packaging cells can be of any suitable cell type. Mostcommonly, mammalian producer cells are used but other cells, such asinsect cells are not excluded. Clearly, the producer cells will need tobe capable of efficiently translating the env and gag, pol mRNA. Manysuitable producer/packaging cell lines are known in the art. The skilledperson is also capable of making suitable packaging cell lines by, forexample stably introducing a nucleotide construct encoding a packagingcomponent into a cell line.

As will be discussed below, where the retroviral genome encodes aninhibitory RNA molecule capable of effecting the cleavage of gag, poland/or env RNA transcripts, the nucleotide sequences present in thepackaging cell line, either integrated or carried on plasmids, or in thetransiently transfected producer cell line, which encode gag, pol and orenv proteins will be modified so as to reduce or prevent binding of theinhibitory RNA molecule(s). In this way, the inhibitory RNA molecule(s)will not prevent expression of components in packaging cell lines thatare essential for packaging of viral particles.

It is highly desirable to use high-titre virus preparations in bothexperimental and practical applications. Techniques for increasing viraltitre include using a psi plus packaging signal as discussed above andconcentration of viral stocks. In addition, the use of differentenvelope proteins, such as the G protein from vesicular-stomatitis virushas improved titres following concentration to 10⁹ per ml (Cosset etal., 1995). However, typically the envelope protein will be chosen suchthat the viral particle will preferentially infect cells that areinfected with the virus which it desired to treat. For example where anHIV vector is being used to treat HIV infection, the env protein usedwill be the HIV env protein.

Suitable first nucleotide sequences for use according to the presentinvention encode gene products that result in the cleavage and/orenzymatic degradation of a target nucleotide sequence, which willgenerally be a ribonucleotide. As particular examples, EGSs, ribozymes,and antisense sequences may be mentioned, more specifically EGSs.

External guide sequences (EGSs) are RNA sequences that bind to acomplementary target sequence to form a loop in the target RNA sequence,the overall structure being a substrate for RNaseP-mediated cleavage ofthe target RNA sequence. The structure that forms when the EGS annealsto the target RNA is very similar to that found in a tRNA precursor. Thethe natural activity of RNaseP can be directed to cleave a target RNA bydesigning a suitable EGS. The general rules for EGS design are asfollows, with reference to the generic EGSs shown in FIG. 9B:

Rules for EGS Design in Mammalian Cells (See FIG. 9B)

Target sequence—All tRNA precursor molecules have a G immediately 3′ ofthe RNaseP cleavage site (i.e. the G forms a base pair with the C at thetop of the acceptor stem prior to the ACCA sequence). In addition a U isfound 8 nucleotides downstream in all tRNAs. (i.e. G at position 1, U atposition 8). A pyrimidine may be preferred 5′ of the cut site. No otherspecific target sequences are required.

EGS sequence—A 7 nucleotide ‘acceptor stem’ analogue is optimal (5′hybridising arm). A 4 nucleotide ‘D-stem’ analogue is preferred (3′hybridising arm). Variation in this length may alter the reactionkinetics. This will be specific to each target site. A consensus ‘T-stemand loop’ analogue is essential. Minimal 5′ and 3′ non-pairing sequencesare preferred to reduce the potential for undesired folding of the EGSRNA.

Deletion of the ‘anti-codon stem and loop’ analogue may be beneficial.Deletion of the variable loop can also be tolerated in vitro but anoptimal replacement loop for the deletion of both has not been definedin vivo.

As with ribozymes, described below, it is preferred to use more than oneEGS. Preferably, a plurality of EGSs is employed, together capable ofcleaving gag, pol and env RNA of the native retrovirus at a plurality ofsites. Since HIV exists as a population of quasispecies, not all of thetarget sequences for the EGSs will be included in all HIV variants. Theproblem presented by this variability can be overcome by using multipleEGs. Multiple EGSs can be included in series in a single vector and canfunction independently when expressed as a single RNA sequence. A singleRNA containing two or more EGSs having different target recognitionsites may be referred to as a multitarget EGS.

Further guidance may be obtained by reference to, for example, Werner etal. (1997); Werner et al. (1998); Ma et al. (1998) and Kawa et al.(1998).

Ribozymes are RNA enzymes which cleave RNA at specific sites. Ribozyrnescan be engineered so as to be specific for any chosen sequencecontaining a ribozyme cleavage site. Thus, ribozymes can be engineeredwhich have chosen recognition sites in transcribed viral sequences. Byway of an example, ribozymes encoded by the first nucleotide sequencerecognise and cleave essential elements of viral genomes required forthe production of viral particles, such as packaging components. Thus,for retroviral genomes, such essential elements include the gag, pod andenv gene products. A suitable ribozyme capable of recognising at leastone of the gag, pol and env gene sequences, or more typically, the RNAsequences transcribed from these genes, is able to bind to and cleavesuch a sequence. This will reduce or prevent production of the gal, polor env protein as appropriate and thus reduce or prevent the productionof retroviral particles.

Ribozymes come in several forms, including hammerhead, hairpin andhepatitis delta antigenomic ribozymes. Preferred for use herein arehammerhead ribozymes, in part because of their relatively small size,because the sequence requirements for their target cleavage site areminimal and because they have been well characterised. The ribozymesmost commonly used in research at present are hammerhead and hairpinribozyrnes.

Each individual ribozyme has a motif which recognises and binds to arecognition site in the target RNA. This motif takes the form of one ormore “binding arms”, generally two binding arms. The binding arms inhammerhead ribozymes are the flanking sequences Helix I and Helix III,which flank Helix II. These can be of variable length, usually between 6to 10 nucleotides each, but can be shorter or longer. The length of theflanking sequences can affect the rate of cleavage. For example, it hasbeen found that reducing the total number of nucleotides in the flankingsequences from 20 to 12 can increase the turnover rate of the ribozymecleaving a HIV sequence, by 10-fold (Goodchild et al., 1991). Acatalytic motif in the ribozyme Helix II in hammerhead ribozymes cleavesthe target RNA at a site which is referred to as the cleavage site.Whether or not a ribozyme will cleave any given RNA is determined by thepresence or absence of a recognition site for the ribozyme containing anappropriate cleavage site.

Each type of ribozyme recognises its own cleavage site. The hammerheadribozyme cleavage site has the nucleotide base triplet GUX directlyupstream where G is guanine, U is uracil and X is any nucleotide base.Hairpin ribozymes have a cleavage site of BCUGNYR, where B is anynucleotide base other than adenine, N is any nucleotide, Y is cytosineor thymine and R is guanine or adenine. Cleavage by hairpin ribozyrnestakes places between the G and the N in the cleavage site.

The nucleic acid sequences encoding the packaging components (the “thirdnucleotide sequences”) may be resistant to the ribozyme or ribozymesbecause they lack any cleavage sites for the ribozyme or ribozymes. Thisprohibits enzymatic activity by the ribozyme or ribozymes and thereforethere is no effective recognition site for the ribozyme or ribozymes.Alternatively or additionally, the potential recognition sites may bealtered in the flanking sequences which form the part of the recognitionsite to which the ribozyme binds. This either eliminates binding of theribozyme motif to the recognition site, or reduces binding capabilityenough to destabilise any ribozyme-target complex and thus reduce thespecificity and catalytic activity of the ribozyme. Where the flankingsequences only are altered, they are preferably altered such thatcatalytic activity of the ribozyme at the altered target sequence isnegligible and is effectively eliminated.

Preferably, a series of several anti-HIV ribozymes is employed in theinvention. These can be any anti-HIV ribozymes but must include one ormore which cleave the RNA that is required for the expression of gag,pot or env. Preferably, a plurality of ribozymes is employed, togethercapable of cleaving gag, pot and env RNA of the native retrovirus at aplurality of sites. Since HIV exists as a population of quasispecies,not all of the target sequences for the ribozymes will be included inall HIV variants. The problem presented by this variability can beovercome by using multiple ribozymes. Multiple ribozymes can be includedin series in a single vector and can function independently whenexpressed as a single RNA sequence. A single RNA containing two or moreribozymes having different target recognition sites may be referred toas a multitarget ribozyme. The placement of ribozymes in series has beendemonstrated to enhance cleavage. The use of a plurality of ribozymes isnot limited to treating HIV infection but may be used in relation toother viruses, retroviruses or otherwise.

Antisense technology is well known on the art. There are variousmechanisms by which antisense sequences are believed to inhibit geneexpression. One mechanism by which antisense sequences are believed tofunction is the recruitment of the cellular protein RNaseH to the targetsequence/antisense construct heteroduplex which results in cleavage anddegradation of the heteroduplex. Thus the antisense construct, bycontrast to ribozymes, can be said to lead indirectly tocleavage/degradation of the target sequence. Thus according to thepresent invention, a first nucleotide sequence may encode an antisenseRNA that binds to either a gene encoding an essential/packagingcomponent or the RNA transcribed from said gene such that expression ofthe gene is inhibited, for example as a result of RNaseH degradation ofa resulting heteroduplex. It is not necessary for the antisenseconstruct to encode the entire complementary sequence of the geneencoding an essential/packaging component—a portion may suffice. Theskilled person will easily be able to determine how to design a suitableantisense construct.

By contrast, the nucleic acid sequences encoding the essential/packagingcomponents of the viral particles required for the assembly of viralparticles in the host cells/producer cells/packaging cells (the thirdnucleotide sequences) are resistant to the inhibitory RNA moleculesencoded by the first nucleotide sequence. For example in the case ofribozymes, resistance is typically by virtue of alterations in thesequences which eliminate the ribozyme recognition sites. At the sametime, the amino acid coding sequence for the essential/packagingcomponents is retained so that the viral components encoded by thesequences remain the same, or at least sufficiently similar that thefunction of the essential/packaging components is not compromised.

The term “viral polypeptide required for the assembly of viralparticles” means a polypeptide normally encoded by the viral genome tobe packaged into viral particles, in the absence of which the viralgenome cannot be packaged. For example, in the context of retrovirusessuch polypeptides would include gag, pol and env. The terms “packagingcomponent” and “essential component” are also included within thisdefinition.

In the case of antisense sequences, the third nucleotide sequencediffers from the second nucleotide sequence encoding the target viralpackaging component antisense sequence to the extent that although theantisense sequence can bind to the second nucleotide sequence, ortranscript thereof, the antisense sequence can not bind effectively tothe third nucleotide sequence or RNA transcribed from therefrom Thechanges between the second and third nucleotide sequences will typicallybe conservative changes, although a small number of amino acid changesmay be tolerated provided that, as described above, the function of theessential/packaging components is not significantly impaired.

Preferably, in addition to eliminating the inhibitory RNA recognitionsites, the alterations to the coding sequences for the viral componentsimprove the sequences for codon usage in the mammalian cells or othercells which are to act as the producer cells for retroviral vectorparticle production. This improvement in codon usage is referred to as“codon optimisation”. Many viruses, including HIV and otherlentiviruses, use a large number of rare codons and by changing these tocorrespond to commonly used mammalian codons, increased expression ofthe packaging components in mammalian producer cells can be achieved.Codon usage tables are known in the art for mammalian cells, as well asfor a variety of other organisms.

Thus preferably, the sequences encoding the packaging components arecodon optimised. More preferably, the sequences are codon optimised intheir entirety. Following codon optimisation, it is found that there arenumerous sites in the wild type gag, pol and env sequences which canserve as inhibitory RNA recognition sites and which are no longerpresent in the sequences encoding the packaging components. In analternative but less practical strategy, the sequences encoding thepackaging components can be altered by targeted conservative alterationsso as to render them resistant to selected inhibitory RNAs capable ofeffecting the cleavage of the wild type sequences.

An additional advantage of codon optimising HIV packaging components isthat this can increase gene expression. In particular, it can rendergag, pol expression Rev independent so that rev and RRE need not beincluded in the genome (Haas et at., 1996). Rev-independent vectors aretherefore possible. This in turn enables the use of anti-rev or RREfactors in the retroviral vector.

As described above, the packaging components for a retroviral vectorinclude expression products of gag, pol and env genes. In accordancewith the present invention, gag and pol employed in the packaging systemare derived from the target retrovirus on which the vector genome isbased. Thus, in the RNA transcript form, gag and pol would normally becleavable by the ribozymes present in the vector genome. The env geneemployed in the packaging system may be derived from a different virus,including other retroviruses such as MLV and non-retroviruses such asVSV (a Rhabdovirus), in which case it may not need any sequencealteration to render it resistant to cleavage effected by the inhibitoryRNA(s). Alternatively, env may be derived from the same retrovirus asgag and pol, in which case any recognition sites for the inhibitoryRNA(s) will need to be eliminated by sequence alteration.

The process of producing a retroviral vector in which the envelopeprotein is not the native envelope of the retrovirus is known as“pseudotyping”. Certain envelope proteins, such as MLV envelope proteinand vesicular stomatitis virus G (VSV-G) protein, pseudotyperetroviruses very weft. Pseudotyping can be useful for altering thetarget cell range of the retrovirus. Alternatively, to maintain targetcell specificity for target cells infected with the a particular virusit is desired to treat, the envelope protein may be the same as that ofthe target virus, for example HIV.

Other therapeutic coding sequences may be present along with the firstnucleotide sequence or sequences. Other therapeutic coding sequencesinclude, but are not limited to, sequences encoding cytokines, hormones,antibodies, immunoglobulin fusion proteins, enzymes, immuneco-stimulatory molecules, anti-sense RNA, a transdominant negativemutant of a target protein, a toxin, a conditional toxin, an antigen, asingle chain antibody, tumour suppresser protein and growth factors.When included, such coding sequences are operatively linked to asuitable promoter, which may be the promoter driving expression of thefirst nucleotide sequence or a different promoter or promoters.

Thus the invention comprises two components. The first is a genomeconstruction that will be packaged by viral packaging components andwhich carries a series of anti-viral inhibitory RNA molecules such asanti-HIVEGs. These could be any anti-HIV EGSs but the key issue for thisinvention is that some of them result in cleavage of RNA that isrequired for the expression of native or wild type HIV gag, pol or envcoding sequences. The second component is the packaging system whichcomprises a cassette for the expression of HIV gag, pol and a cassetteeither for HIV env or an envelope gene encoding a pseudotyping envelopeprotein—the packaging system being resistant to the inhibitory RNAmolecules.

The viral particles of the present invention, and the viral vectorsystem and methods used to produce may thus be used to treat or preventviral infections, preferably retroviral infections, in particularlentiviral, especially HIV, infections. Specifically, the viralparticles of the invention, typically produced using the viral vectorsystem of the present invention may be used to deliver inhibitory RNAmolecules to a human or animal in need of treatment for a viralinfection.

Alternatively, or in addition, the viral production system may be usedto transfect cells obtained from a patient ex vivo and then returned tothe patient. Patient cells transfected ex vivo may be formulated as apharmaceutical composition (see below) prior to readministration to thepatient.

Preferably the viral particles are combined with a pharmaceuticallyacceptable carrier or diluent to produce a pharmaceutical composition.Thus, the present invention also provides a pharmaceutical compositionfor treating an individual, wherein the composition comprises atherapeutically effective amount of the viral particle of the presentinvention, together with a pharmaceutically acceptable carrier, diluent,excipient or adjuvant. The pharmaceutical composition may be for humanor animal usage.

The choice of pharmaceutical carrier, excipient or diluent can beselected with regard to the intended route of administration andstandard pharmaceutical practice. Suitable carriers and diluents includeisotonic saline solutions, for example phosphate-buffered saline. Thepharmaceutical compositions may comprise as—or in addition to—thecarrier, excipient or diluent any suitable binder(s), lubricant(s),suspending agent(s), coating agent(s), solubilising agent(s), and othercarrier agents that may aid or increase the viral entry into the targetsite (such as for example a lipid delivery system).

The pharmaceutical composition may be formulated for parenteral,intramuscular, intravenous, intracranial, subcutaneous, oral,intraocular or transdermal administration.

Where appropriate, the pharmaceutical compositions can be administeredby any one or more of inhalation, in the form of a suppository orpessary, topically in the form of a lotion, solution, cream, ointment ordusting powder, by use of a skin patch, orally in the form of tabletscontaining excipients such as starch or lactose, or in capsules orovules either alone or in admixture with excipients, or in the form ofelixirs, solutions or suspensions containing flavouring or colouringagents, or they can be injected parenterally, for exampleintracavernosally, intravenously, intramuscularly or subcutaneously. Forparenteral administration, the compositions may be best used in the formof a sterile aqueous solution which may contain other substances, forexample enough salts or monosaccharides to make the solution isotonicwith blood. For buccal or sublingual administration the compositions maybe administered in the form of tablets or lozenges which can beformulated in a conventional manner.

The amount of virus administered is typically in the range of from 10³to 10¹⁰ pfu, preferably from 10⁵ to 10⁸ pfu, more preferably from 10⁶ to10⁷ pfu. When injected, typically 1-10 μl of virus in a pharmaceuticallyacceptable suitable carrier or diluent is administered.

When the polynucleotide/vector is administered as a naked nucleic acid,the amount of nucleic acid administered is typically in the range offrom 1 μg to 10 mg, preferably from 100 μg to 1 mg.

Where the first nucleotide sequence (or other therapeutic sequence) isunder the control of an inducible regulatory sequence, it may only benecessary to induce gene expression for the duration of the treatment.Once the condition has been treated, the inducer is removed andexpression of the NOT is stopped. This will clearly have clinicaladvantages. Such a system may, for example, involve administering theantibiotic tetracycline, to activate gene expression via its effect onthe tet repressor/VP16 fusion protein.

The invention will now be further described by way of Examples, whichare meant to serve to assist one of ordinary skill in the art incarrying out the invention and are not intended in any way to limit thescope of the invention. The Examples refer to the Figures. In theFigures:

FIG. 1 shows schematically ribozyrnes inserted into four different HIVvectors;

FIG. 2 shows schematically how to create a suitable 3′ LTR by PCR;

FIG. 3 shows the codon usage table for wild type HIV gag,pol of strainHXB2 (accession number: K03455).

FIG. 4 shows the codon usage table of the codon optimised sequencedesignated gag,pol-SYNgp.

FIG. 5 shows the codon usage table of the wild type HIV env calledenv-mn.

FIG. 6 shows the codon usage table of the codon optimised sequence ofHIV env designated SYNgp160 mn

FIG. 7 shows three plasmid constructs for use in the invention.

FIG. 8 shows the principle behind two systems for producing retroviralvector particles.

FIG. 9A shows an EGS based on tyrosyl t-RNA

FIG. 9B shows a consensus EGS sequence.

FIG. 10 shows twelve different anti-HIV EGS constructs.

FIG. 11 is a schematic representation of pDozenEgs and construction ofpH4DozenEgs.

The invention will now be further described in the Examples whichfollow, which are intended as an illustration only and do not limit thescope of the invention.

EXAMPLES Reference Example 1—Construction of a Ribozyme-encoding Genome

The HIV gag.pol sequence was codon optimised (FIG. 4 and SEQ I.D. No. 1)and synthesised using overlapping oligos of around 40 nucleotides. Thishas three advantages. Firstly it allows an HIV based vector to carryribozymes and other therapeutic factors. Secondly the codon optimisationgenerates a higher vector titre due to a higher level of geneexpression. Thirdly gag.pol expression becomes rev independent whichallows the use of anti-rev or RRE factors.

Conserved sequences within gag.pol were identified by reference to theHIV Sequence database at Los Alamos National Laboratory (http://hiv-web.lanl.gov/) and used to design ribozymes. Because of thevariability between subtypes of HIV-1 the ribozymes were designed tocleave the predominant subtype within North America, Latin America andthe Caribbean, Europe, Japan and Australia; that is subtype B. The siteschosen were cross-referenced with the synthetic gagpol sequence toensure that there was a low possibility of cutting the codon optimisedgagpol mRNA. The ribozyrnes were designed with XhoI and

SalI sites at the 5′ and 3′ end respectively. This allows theconstruction of separate and tandem ribozymes.

The ribozymes are hammerhead (Riddell et al., 1996) structures of thefollowing general structure:

(SEQ ID NO: 15) Helix I            Helix II             Helix III5′-NNNNNNNN˜   CUGAUGAGGCCGAAAGGCCGAA   ˜NNNNNNNN˜

The catalytic domain of the ribozyme (Helix II) can tolerate somechanges without reducing catalytic turnover.

The cleavage sites, targeting gag and pol with the essential GUX triplet(where X is any nucleotide base) are as follows:

GAG 1 5′ UAGUAAGAAUGUAUAGCCCUAC (SEQ ID NO: 16)

GAG 2 5′ AACCCAGAUUGUAAGACUAUUU (SEQ ID NO: 17)

GAG 3 5′ UGUUUCAAUUGUGGCAAAGAAG (SEQ ID NO: 18)

GAG 4 5′ AAAAAGGGCUGUUGGAAAUGUG (SEQ ID NO: 19)

POL 1 5′ ACGACCCCUCGUCACAAUAAAG (SEQ ID NO: 20)

POL 2 5′ GGAAUUGGAGGUUUUAUCAAAG (SEQ ID NO: 21)

POL 3 5′ AUAUUUUUCAGUUCCCUUAGAU (SEQ ID NO. 22)

POL 4 5′ UGGAUGAUUUGUAUGUAGGAUC (SEQ ID NO: 23)

POL 5 5° CUUUGGAUGGGUUAUGAACUCC (SEQ ID NO: 24)

POL 6 5° CAGCUGGACUGUCAAUGACAUA (SEQ ID NO: 25)

POL 7 5′ AACUUUCUAUGUAGAUGGGGCA (SEQ ID NO: 26)

POL 8 5′ AAGGCCGCCUGUUGGUGGGCAG (SEQ ID NO: 27)

POL 9 5′ UAAGACAGCAGUACAAAUGGCA (SEQ ID NO: 28)

The ribozymes are inserted into four different HIV vectors (pH4 (Gervaixet al., 1997), pH6, pH4.1, or pH6.1) (FIG. 1). In pH4 and pH6,transcription of the ribozymes is driven by an internal HCMV promoter(Foecking et al., 1986). From pH4.1 and pH6.1, the ribozymes areexpressed from the 5′ LTR. The major difference between pH4 and pH6 (andpH4.1 and pH6.1) resides in the 3′ LTR in the production plasmid. pH4and pH4.1 have the HIV U3 in the 3′ LTR. pH6 and pH6.1 have HCMV in the3′LTR. The HCMV promoter replaces most of the U3 and will driveexpression at high constitutive levels while the HIV-1 U3 will support ahigh level of expression only in the presence of Tat.

The HCMV/HIV-1 hybrid 3′ LTR is created by recombinant PCR with threePCR primers (FIG, 2), The first round of PCR is performed with RIB1 andRIB2 using pH4 (Kim et al., 1998) as the template to amplify the HIV-1HXB2 sequence 8900-9123. The second round of PCR makes the junctionbetween the 4′ end of the HIV-1 U3 and the HCMV promoter by amplifyingthe hybrid 5′ LTR from pH4. The PCR product from the first PCR reactionand RIB3 serves as the 5′ primer and 3′ primer respectively.

RIB1: 5′ CAGCTGCTCGAGCAGCTGAAGCTTGCATGC 3′ (SEQ ID NO: 29)

RIB2: 5′ GTAAGTTATGTAACGGACGATATCTTGTCTTCTT 3′ (SEQ ID NO: 30)

RIB3: 5′ CGCATAGTCGACGGGCCCGCCACTGCTAGAGATTTTC 3′ (SEQ ID NO: 31)

The PCR product is then cut with SphI and SalI and inserted into pH4thereby replacing the 3′ LTR. The resulting plasmid is designated pH6.To construct pH4.1 and pH6.1, the internal HCMV promoter (SpeI-XhoI) inpH4 and pH6 is replaced with the polycloning site of pBluescript II KS+(Stratagene) (SpeI-XhoI).

The ribozymes are inserted into the XhoI sites in the genome vectorbackbones. Any ribozymes in any configuration could be used in a similarway.

Reference Example 2—Construction of a Packaging System

The packaging system can take various forms. In a first form ofpackaging system, the HIV gag, pol components are co-expressed with theHIV env coding sequence. In this case, both the gag, pol and the envcoding sequences are altered such that they are resistant to theanti-HIV ribozymes that are built into the genome. At the same time asaltering the codon usage to achieve resistance, the codons can be chosento match the usage pattern of the most highly expressed mammalian genes.This dramatically increases expression levels and so increases titre. Acodon optimised HIV env coding sequence has been described by Haas etal. (1996). In the present example, a modified codon optimised HIV envsequence is used (SEQ I.D. No. 3). The corresponding env expressionplasmid is designated pSYNgp 160 mn. The modified sequence containsextra motifs not used by Haas et at. The extra sequences were taken fromthe HIV env sequence of strain MN and codon optimised. Any similarmodification of the nucleic acid sequence would function similarly aslong as it used codons corresponding to abundant tRNAs (Zolotukhin etal., 1996) and lead to resistance to the ribozymes in the genome.

In one example of a gag, pol coding sequence with optimised codon usage,overlapping oligonucleotides are synthesised and then ligated togetherto produce the synthetic coding sequence. The sequence of a wild-type(Genbank accession no. K03455) and synthetic (gagpol-SYNgp) gagpolsequence is shown in SEQ I.D. Nos 1 and 2, respectively and their codonusage is shown in FIGS. 3 and 4, respectively. The sequence of a wildtype env coding sequence (Genbank Accession No. M17449) is given in SEQI.D. No 3, the sequence of a synthetic codon optimised sequence is givenin SEQ. I.D. No. 4 and their codon usage tables are given in FIGS. 5 and6, respectively. As with the env coding sequence any gag, pol sequencethat achieves resistance to the ribozymes could be used. The syntheticsequence shown is designated gag, pol-SYNgp and has an EcoRI site at the5′ end and a Notl site at the 3′ end. It is inserted into pClneo(Promega) to produce plasmid pSYNgp.

The sequence of the codon optimised gagpol sequence is shown in SEQ I.D.No. 2. This sequence starts at the ATG and ends at the stop codon ofgagpol. The wild type sequence is retained around the frameshift site sothat the right amount of gagpol is made.

In addition other constructs can be used that contain the optimisedgagpol of pSYNgp but also have differing amounts of the wild type HIV 1sequence of strain HXB2 (accession number: K03455) at the 5′ end. Theseconstructs are described below (the start ATG of pSYNgp is shown in boldin these sequences).

pSYNgp2 contains the entire leader sequence of HIV-1 (SEQ ID. No. 12).

pSYNgp3 contains the leader sequence of HIV-1 from the major splicedonor (SEQ ID. No. 13).

pSYNgp4 contains 20 pb of the leader sequence of HIV-1 upstream of thestart codon of ATG (SEQ ID. No. 14).

These constructs may be made by overlapping PCR. Using appropriaterestriction enzymes these sequences can be inserted into mammalianexpression vectors such as pCI-Neo (Promega). All these gag/polconstructs can be used to supply HIV gag/pol for the generation of viralvectors. These viral vectors can be used to express either EGS moleculesor ribozyme molecules or antisense molecules or any peptides orproteins.

In a second form of the packaging system a synthetic gag, pol cassetteis coexpressed with a non-HIV envelope coding sequence that produces asurface protein that pseudotypes HIV. This could be for example VSV-G(Ory et al., 1996; Zhu et al 1990), amphotropic MLV env (Chesebro etal., 1990; Spector et al., 1990) or any other protein that would beincorporated into the HIV particle (Valsesia-Wittnan, 1994). Thisincludes molecules capable of targeting the vector to specific tissues.Coding sequences for non-HIV envelope proteins not cleaved by theribozymes and so no sequence modification is required (although somesequence modification may be desirable for other reasons such asoptimisation for codon usage in mammalian cells).

Reference Example 3—Vector Particle Production

Vector particles can be produced either from a transient three-plasmidtransfection system similar to that described by Soneoka et al. (1995)or from producer cell lines similar to those used for other retroviralvectors (Ory et al., 1996; Srinivasakumar et al., 1997; Yu et al.,1996). These principles are illustrated in FIGS. 7 and 8. For example,by using pH6Rz, pSYNgp and pRV67 (VSV-G expression plasmid) in a threeplasmid transfection of 293T cells (FIG. 8), as described by Soneoka etat. (1995), vector particles designated H6Rz-VSV are produced. Thesetransduce the H6Rz genome to CD4+ cells such as C1866 or Jurkat andproduce the multitarget ribozymes. HIV replication in these cells is nowseverely restricted.

Example 1—Use of External Guide Sequences for Inhibiting HIV

Ribonuclease P is a nuclear localised enzyme consisting of protein andRNA subunits. It has been found in all organisms examined and is one ofthe most abundant, stable and efficient enzymes in cells. Its enzymaticactivity is responsible for the maturation of the 5′ termini of alltRNAs which account for about 2% of the total cellular RNA.

For tRNA processing, it has been shown that RNAse P recognises asecondary structure of the tRNA. However extensive studies have shownthat any complex of two RNA molecules which resemble the one tRNAmolecule will also be recognised and cleaved by RNase P. Consequentlythe natural activity of RNase P can and has been successfullyre-directed to target other RNA species (see Yaun and Altman, 1994, andreferences therein). This is achieved by engineering a sequence,containing the flanking motif recognised by RNaseP, to bind the desiredtarget sequence. These sequences are called external guide sequence(EGSs).

Outlined here is a strategy employing the EGS system against HIV RNA.Shown in FIG. 2A, B and C are twelve EGS sequences designed to targettwelve separate HIV gag/pol sequences. These target sequences areconserved throughout the clade B of HIV. The sequence numbering in eachfigure designates the position of the required conserved G of eachtarget sequences based on the HXB2 published sequence.

The external guide sequences shown here all have anticodon stem-loopsdeleted. These are non-limiting examples; for instance full length 3/4tRNA based EGSs might be used if preferred (see Yuan and Altman, 1994).

Outlined in SEQ ID. Nos. 5 to 10 (see below) and FIG. 11 is the cloningstrategy employed to construct an HIV vector containing the EGSsdescribed in SEQ ID. Nos. 5 to 10. The oligonucleotides prefixed 1, 2,3, 4, 5 and 6 are respectively annealed together and sequentially clonedinto the pSP72 (Promega) cloning vector starting with the oligo. duplex1/1A being cloned into the XhoI-SalI site such that the EGS 4762 and EGS4715 are orientated away from the ampicillin gene. The remainingoligonucleotides (with XhoI ends) are subsequently cloned stepwise(starting with oligo. duplex 2/2A, ending with duplex 6/6A) into theunique SalI site (present within the terminus of the each precedingoligonucleotide) to create the plasmid pDOZENEGS. The EGSs from thisvector are then transferred by XhoI-SphI digest into the pH4Z similarilycut such that the multiple EGSs cassette replaces the lacZ gene of pH4Z(Kim et al., 1998). The resulting vector is named pH4DOZENEGS (see SEQID. No. 11 for complete sequence).

Egs 1/1A (SEQ ID NO. 5)

(SEQ ID NO: 5)5′-tcgagcccggggatgacgtcatcgacttcgaaggttcgaatccttctactgccaccattttttcgggcccctactgcagtagctgaagcttccaagcttaggaagatgacggtggtaaaaaactctacgtcatcgacttcgaaggttegaatccttccctgtccaccagtcgacc-3′gagatgcagtagctgaagcttccaagcttaggaagggacaggtggtcagctggagct-5′ SEQ ID NO:32)

Egs 2/2A (SEQ ID NO. 6)

(SEQ ID NO. 6)5′-tcgagtattacgtcatcgacttcgaaggttcgaatccttctagattcaccattttttaggaacgcataatgcagtagctgaag cttccaagcttaggaagtactaagtggtaaaaaatccttgctcatcgacttcg aaggttcgaatccttccagttccaccagtcgacc-3′agtagctgaagcttccaagcttaggaaggtcaaggtggtcagctggagct-5′ (SEQ ID NO. 33)

Egs 3/3A (SEQ ID NO. 7)

(SEQ ID NO. 7)5′-tcgaggccaacgtcatcgacttcgaaggttcgaatccttctcttcccaccattttttttccccggttgcagtagctgaagcttccaagcttaggaagagaagggtggtaaaaaaaaggctgaacgtcatcgacttcgaaggttcgaatccttctgctgtcaccagtcgacc-3′gagatgcagtagctgaagcttccaagcttaggaagggacaggtggtcagctggagct-5′ (SEQ ID NO.34)

Egs 4/4 (SEQ ID NO. 8)

(SEQ ID NO. 8)5′-tcgagggctacgtcatcgacttcgaaggttcgaatccttcttgcttcaccattttttcccgatgcagtagctgaatgcttccaagcttaggaagaacgaagtggtaaaaaactgaacgtcatcgacttcgaaggttcgaatccttctgctgtcaccagtcgacc-3′gagatgcagtagctgaagcttccaagcttaggaagggacaggtggtcagctggagct-5′ (SEQ ID NO.35)

Egs 5/5A (SEQ ID NO. 9)

SEQ ID NO. 8)5′-tcgagtataacgtcatcgacttcgaaggttcgaatccttcaccggtcaccatttttttatacatattgcagtagctgaagcttccaagcttaggaagtggccagtggtaaaaaaatatacgtcatcgacttcgaaggttcgaatccttcttcttacaccagtcgacc-3′tgcagtagctgaagcttccaagcttaggaagaagaatgtggtcagctggagct-5′ (SEQ ID NO. 36)

Egs 6/6A (SEQ ID NO. 10)

(SEQ ID NO. 10)5′-tcgagggctacgtcatcgacttcgaaggttcgaatccttcttgcttcaccattttttcccgatgcagtagctgaatgcttccaagcttaggaagaacgaagtggtaaaaaaacgtcatcgacttcgaaggttcgaatccttcttcttacaccagtcgacc-3′tgcagtagctgaagcttccaagcttaggaagatccgggtggtcagctgcgtacggagct-5′ (SEQ IDNO. 37)

The pH4DOZENEGS—vector may be used to both deliver and express theexample EGS sequences to appropriate eukaryotic cells in a manner asdescribed for ribozymes in reference examples 1, 2 and 3 whereby the useof a codon optimised gag/pol and env genes would prevent EGSs fromtargeting these genes during viral production. The inclusion of the EGSsequences into an HIV derived vector will not only allow expression ofsuch sequences in the target cell but also packaging and transfer ofsuch therapeutic sequences by the patient's own HIV. These example EGSsequences target HIV RNA for cleavage by RNAse P. This example is notlimiting and other suitable EGS and derived sequences may also be used;be they expressed singularly, in multiples, from pol I pol II or pol IIIpromoters and derivatives thereof and/or in combination with other HIVtreatments. Other appropriate nucleotide sequences of interest (NOIs)may also be included in combination with EGSs if preferred.

All publications mentioned in the above specification are hereinincorporated by reference. Various modifications and variations of thedescribed methods and system of the invention will be apparent to thoseskilled in the art without departing from the scope and spirit of theinvention. Although the invention has been described in connection withspecific preferred embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention which are obvious to those skilled inmolecular biology or related fields are intended to be within the scopeof the following claims.

References

Bender et al., 1987, J Virol 61: 1639-1646.

Chesebro, B., K. Wehrly, and W. Maury. 1990. J Virol. 64:4553-7.

Cosset et al., 1995, J. Virol. 69: 7430-7436.

Foecking, M. K., and H. Hofstetter. 1986. Gene. 45:101-105.

Forster and Altman, 1990, Science 249: 783-786.

Gervaix, A., X. Li, G. Kraus, and F. Wong Staal. 199. J Virol. 71:3048-53.

Goodchild, J., V. Kohli. 1991. Arch Biochem Biophys Feb 1;284(2):386-391.

Haas, J., E.-C. Park, and B. Seed. 1996. Current Biology. 6:315.

Kawa et al., 1998, RNA 4: 1397-1406.

Kim, V. N., K. Mitrophanous, S. M. Kingsman, and K. A. J. 1998. J Virol72: 811-816.

Lever, A. M. 1995. Br Med Bull. 51:149-66.

Ma et al., 1998, Antisense and Nucleic Acid Drug Development 8: 415-426.

Ory, D. S., B. A. Neugeboren, and R. C. Mulligan. 1996. Proc Natl AcadSci U S A. 93:11400-6.

Pear et al., 1993, Proc Natl Acad Sci 90: 8392-8396.

Riddell, S. R., M. Elliott, D. A. Lewinsohn, M. J. Gilbert, L. Wilson,S. A. Manley, S. D.

Lupton, R. W. Overell, T. C. Reynolds, L. Corey, and P. D. Greenberg.1996. Nat Med. 2:216-23.

Soneoka, Y., P. M. Cannon, E. E. Ramsdale, J. C. Griffiths, G. Romano,S. M. Kingsman, and A. J. Kingsman. 1995. Nucleic Acids Res. 23:628-33.

Spector, D. H., E. Wade, D. A. Wright, V. Koval, C. Clark, D. Jaquish,and S. A. Spector. 1990. J Virol. 64:2298-2308.

Srinivasakumar, N., N. Chazal, C. Helga Maria, S. Prasad, M. L.Hammarskjold, and D. Rekosh. 1997. J Virol. 71 :5841-8.

Valsesia Wittmann, S., A. Drynda, G. Deleage, M. Aumailiey, J. M. Heard,O. Danos, G.

Verdier, and F. L. Cosset 1994. J Virol. 68:4609-19.

Werner et al., 1997, Nucleic Acids Symposium Series No. 36: 19-21.

Werner et al., 1998, RNA 4: 847-855.

Yu, H., A. B. Rabson, M. Kaul, Y. Ron, and J. P. Dougherty. 1996. JVirol. 70:4530-37.

Yuan and Altman, 1994, Science 263:1269-1273.

Yuan and Altman, 1995, EMBO J. 14: 159-168.

Yuan et al., 1992, Proc Natl Acad Sci 89: 8006-8010.

Zhu, Z. H., S. S. Chen, and A. S. Huang. 1990. J Acquir Immune DeficSyndr. 3:215-9.

Zolotukhin, S., M. Potter, W. W. Hauswirth, J. Guy, and N. Muzyczka.1996. J Virol. 70:4646-54.

73 1 4307 DNA Human immunodeficiency virus type 1 1 atgggtgcgagagcgtcagt attaagcggg ggagaattag atcgatggga aaaaattcgg 60 ttaaggccagggggaaagaa aaaatataaa ttaaaacata tagtatgggc aagcagggag 120 ctagaacgattcgcagttaa tcctggcctg ttagaaacat cagaaggctg tagacaaata 180 ctgggacagctacaaccatc ccttcagaca ggatcagaag aacttagatc attatataat 240 acagtagcaaccctctattg tgtgcatcaa aggatagaga taaaagacac caaggaagct 300 ttagacaagatagaggaaga gcaaaacaaa agtaagaaaa aagcacagca agcagcagct 360 gacacaggacacagcaatca ggtcagccaa aattacccta tagtgcagaa catccagggg 420 caaatggtacatcaggccat atcacctaga actttaaatg catgggtaaa agtagtagaa 480 gagaaggctttcagcccaga agtgataccc atgttttcag cattatcaga aggagccacc 540 ccacaagatttaaacaccat gctaaacaca gtggggggac atcaagcagc catgcaaatg 600 ttaaaagagaccatcaatga ggaagctgca gaatgggata gagtgcatcc agtgcatgca 660 gggcctattgcaccaggcca gatgagagaa ccaaggggaa gtgacatagc aggaactact 720 agtacccttcaggaacaaat aggatggatg acaaataatc cacctatccc agtaggagaa 780 atttataaaagatggataat cctgggatta aataaaatag taagaatgta tagccctacc 840 agcattctggacataagaca aggaccaaag gaacccttta gagactatgt agaccggttc 900 tataaaactctaagagccga gcaagcttca caggaggtaa aaaattggat gacagaaacc 960 ttgttggtccaaaatgcgaa cccagattgt aagactattt taaaagcatt gggaccagcg 1020 gctacactagaagaaatgat gacagcatgt cagggagtag gaggacccgg ccataaggca 1080 agagttttggctgaagcaat gagccaagta acaaattcag ctaccataat gatgcagaga 1140 ggcaattttaggaaccaaag aaagattgtt aagtgtttca attgtggcaa agaagggcac 1200 acagccagaaattgcagggc ccctaggaaa aagggctgtt ggaaatgtgg aaaggaagga 1260 caccaaatgaaagattgtac tgagagacag gctaattttt tagggaagat ctggccttcc 1320 tacaagggaaggccagggaa ttttcttcag agcagaccag agccaacagc cccaccagaa 1380 gagagcttcaggtctggggt agagacaaca actccccctc agaagcagga gccgatagac 1440 aaggaactgtatcctttaac ttccctcagg tcactctttg gcaacgaccc ctcgtcacaa 1500 taaagataggggggcaacta aaggaagctc tattagatac aggagcagat gatacagtat 1560 tagaagaaatgagtttgcca ggaagatgga aaccaaaaat gataggggga attggaggtt 1620 ttatcaaagtaagacagtat gatcagatac tcatagaaat ctgtggacat aaagctatag 1680 gtacagtattagtaggacct acacctgtca acataattgg aagaaatctg ttgactcaga 1740 ttggttgcactttaaatttt cccattagcc ctattgagac tgtaccagta aaattaaagc 1800 caggaatggatggcccaaaa gttaaacaat ggccattgac agaagaaaaa ataaaagcat 1860 tagtagaaatttgtacagag atggaaaagg aagggaaaat ttcaaaaatt gggcctgaaa 1920 atccatacaatactccagta tttgccataa agaaaaaaga cagtactaaa tggagaaaat 1980 tagtagatttcagagaactt aataagagaa ctcaagactt ctgggaagtt caattaggaa 2040 taccacatcccgcagggtta aaaaagaaaa aatcagtaac agtactggat gtgggtgatg 2100 catatttttcagttccctta gatgaagact tcaggaagta tactgcattt accataccta 2160 gtataaacaatgagacacca gggattagat atcagtacaa tgtgcttcca cagggatgga 2220 aaggatcaccagcaatattc caaagtagca tgacaaaaat cttagagcct tttagaaaac 2280 aaaatccagacatagttatc tatcaataca tggatgattt gtatgtagga tctgacttag 2340 aaatagggcagcatagaaca aaaatagagg agctgagaca acatctgttg aggtggggac 2400 ttaccacaccagacaaaaaa catcagaaag aacctccatt cctttggatg ggttatgaac 2460 tccatcctgataaatggaca gtacagccta tagtgctgcc agaaaaagac agctggactg 2520 tcaatgacatacagaagtta gtggggaaat tgaattgggc aagtcagatt tacccaggga 2580 ttaaagtaaggcaattatgt aaactcctta gaggaaccaa agcactaaca gaagtaatac 2640 cactaacagaagaagcagag ctagaactgg cagaaaacag agagattcta aaagaaccag 2700 tacatggagtgtattatgac ccatcaaaag acttaatagc agaaatacag aagcaggggc 2760 aaggccaatggacatatcaa atttatcaag agccatttaa aaatctgaaa acaggaaaat 2820 atgcaagaatgaggggtgcc cacactaatg atgtaaaaca attaacagag gcagtgcaaa 2880 aaataaccacagaaagcata gtaatatggg gaaagactcc taaatttaaa ctgcccatac 2940 aaaaggaaacatgggaaaca tggtggacag agtattggca agccacctgg attcctgagt 3000 gggagtttgttaatacccct cccttagtga aattatggta ccagttagag aaagaaccca 3060 tagtaggagcagaaaccttc tatgtagatg gggcagctaa cagggagact aaattaggaa 3120 aagcaggatatgttactaat agaggaagac aaaaagttgt caccctaact gacacaacaa 3180 atcagaagactgagttacaa gcaatttatc tagctttgca ggattcggga ttagaagtaa 3240 acatagtaacagactcacaa tatgcattag gaatcattca agcacaacca gatcaaagtg 3300 aatcagagttagtcaatcaa ataatagagc agttaataaa aaaggaaaag gtctatctgg 3360 catgggtaccagcacacaaa ggaattggag gaaatgaaca agtagataaa ttagtcagtg 3420 ctggaatcaggaaagtacta tttttagatg gaatagataa ggcccaagat gaacatgaga 3480 aatatcacagtaattggaga gcaatggcta gtgattttaa cctgccacct gtagtagcaa 3540 aagaaatagtagccagctgt gataaatgtc agctaaaagg agaagccatg catggacaag 3600 tagactgtagtccaggaata tggcaactag attgtacaca tttagaagga aaagttatcc 3660 tggtagcagttcatgtagcc agtggatata tagaagcaga agttattcca gcagaaacag 3720 ggcaggaaacagcatatttt cttttaaaat tagcaggaag atggccagta aaaacaatac 3780 atactgacaatggcagcaat ttcaccggtg ctacggttag ggccgcctgt tggtgggcgg 3840 gaatcaagcaggaatttgga attccctaca atccccaaag tcaaggagta gtagaatcta 3900 tgaataaagaattaaagaaa attataggac aggtaagaga tcaggctgaa catcttaaga 3960 cagcagtacaaatggcagta ttcatccaca attttaaaag aaaagggggg attggggggt 4020 acagtgcaggggaaagaata gtagacataa tagcaacaga catacaaact aaagaattac 4080 aaaaacaaattacaaaaatt caaaattttc gggtttatta cagggacagc agaaattcac 4140 tttggaaaggaccagcaaag ctcctctgga aaggtgaagg ggcagtagta atacaagata 4200 atagtgacataaaagtagtg ccaagaagaa aagcaaagat cattagggat tatggaaaac 4260 agatggcaggtgatgattgt gtggcaagta gacaggatga ggattag 4307 2 4307 DNA ArtificialSequence Description of Artificial Sequence gagpol-SYNgp-codon optimisedgagpol sequence 2 atgggcgccc gcgccagcgt gctgtcgggc ggcgagctgg accgctgggagaagatccgc 60 ctgcgccccg gcggcaaaaa gaagtacaag ctgaagcaca tcgtgtgggccagccgcgaa 120 ctggagcgct tcgccgtgaa ccccgggctc ctggagacca gcgaggggtgccgccagatc 180 ctcggccaac tgcagcccag cctgcaaacc ggcagcgagg agctgcgcagcctgtacaac 240 accgtggcca cgctgtactg cgtccaccag cgcatcgaaa tcaaggatacgaaagaggcc 300 ctggataaaa tcgaagagga acagaataag agcaaaaaga aggcccaacaggccgccgcg 360 gacaccggac acagcaacca ggtcagccag aactacccca tcgtgcagaacatccagggg 420 cagatggtgc accaggccat ctccccccgc acgctgaacg cctgggtgaaggtggtggaa 480 gagaaggctt ttagcccgga ggtgataccc atgttctcag ccctgtcagagggagccacc 540 ccccaagatc tgaacaccat gctcaacaca gtggggggac accaggccgccatgcagatg 600 ctgaaggaga ccatcaatga ggaggctgcc gaatgggatc gtgtgcatccggtgcacgca 660 gggcccatcg caccgggcca gatgcgtgag ccacggggct cagacatcgccggaacgact 720 agtacccttc aggaacagat cggctggatg accaacaacc cacccatcccggtgggagaa 780 atctacaaac gctggatcat cctgggcctg aacaagatcg tgcgcatgtatagccctacc 840 agcatcctgg acatccgcca aggcccgaag gaaccctttc gcgactacgtggaccggttc 900 tacaaaacgc tccgcgccga gcaggctagc caggaggtga agaactggatgaccgaaacc 960 ctgctggtcc agaacgcgaa cccggactgc aagacgatcc tgaaggccctgggcccagcg 1020 gctaccctag aggaaatgat gaccgcctgt cagggagtgg gcggacccggccacaaggca 1080 cgcgtcctgg ctgaggccat gagccaggtg accaactccg ctaccatcatgatgcagcgc 1140 ggcaactttc ggaaccaacg caagatcgtc aagtgcttca actgtggcaaagaagggcac 1200 acagcccgca actgcagggc ccctaggaaa aagggctgct ggaaatgcggcaaggaaggc 1260 caccagatga aagactgtac tgagagacag gctaattttt tagggaagatctggccttcc 1320 tacaagggaa ggccagggaa ttttcttcag agcagaccag agccaacagccccaccagaa 1380 gagagcttca ggtctggggt agagacaaca actccccctc agaagcaggagccgatagac 1440 aaggaactgt atcctttaac ttccctcaga tcactctttg gcaacgacccctcgtcacaa 1500 taaagatagg ggggcagctc aaggaggctc tcctggacac cggagcagacgacaccgtgc 1560 tggaggagat gtcgttgcca ggccgctgga agccgaagat gatcgggggaatcggcggtt 1620 tcatcaaggt gcgccagtat gaccagatcc tcatcgaaat ctgcggccacaaggctatcg 1680 gtaccgtgct ggtgggcccc acacccgtca acatcatcgg acgcaacctgttgacgcaga 1740 tcggttgcac gctgaacttc cccattagcc ctatcgagac ggtaccggtgaagctgaagc 1800 ccgggatgga cggcccgaag gtcaagcaat ggccattgac agaggagaagatcaaggcac 1860 tggtggagat ttgcacagag atggaaaagg aagggaaaat ctccaagattgggcctgaga 1920 acccgtacaa cacgccggtg ttcgcaatca agaagaagga ctcgacgaaatggcgcaagc 1980 tggtggactt ccgcgagctg aacaagcgca cgcaagactt ctgggaggttcagctgggca 2040 tcccgcaccc cgcagggctg aagaagaaga aatccgtgac cgtactggatgtgggtgatg 2100 cctacttctc cgttcccctg gacgaagact tcaggaagta cactgccttcacaatccctt 2160 cgatcaacaa cgagacaccg gggattcgat atcagtacaa cgtgctgccccagggctgga 2220 aaggctctcc cgcaatcttc cagagtagca tgaccaaaat cctggagcctttccgcaaac 2280 agaaccccga catcgtcatc tatcagtaca tggatgactt gtacgtgggctctgatctag 2340 agatagggca gcaccgcacc aagatcgagg agctgcgcca gcacctgttgaggtggggac 2400 tgaccacacc cgacaagaag caccagaagg agcctccctt cctctggatgggttacgagc 2460 tgcaccctga caaatggacc gtgcagccta tcgtgctgcc agagaaagacagctggactg 2520 tcaacgacat acagaagctg gtggggaagt tgaactgggc cagtcagatttacccaggga 2580 ttaaggtgag gcagctgtgc aaactcctcc gcggaaccaa ggcactcacagaggtgatcc 2640 ccctaaccga ggaggccgag ctcgaactgg cagaaaaccg agagatcctaaaggagcccg 2700 tgcacggcgt gtactatgac ccctccaagg acctgatcgc cgagatccagaagcaggggc 2760 aaggccagtg gacctatcag atttaccagg agcccttcaa gaacctgaagaccggcaagt 2820 acgcccggat gaggggtgcc cacactaacg acgtcaagca gctgaccgaggccgtgcaga 2880 agatcaccac cgaaagcatc gtgatctggg gaaagactcc taagttcaagctgcccatcc 2940 agaaggaaac ctgggaaacc tggtggacag agtattggca ggccacctggattcctgagt 3000 gggagttcgt caacacccct cccctggtga agctgtggta ccagctggagaaggagccca 3060 tagtgggcgc cgaaaccttc tacgtggatg gggccgctaa cagggagactaagctgggca 3120 aagccggata cgtcactaac cggggcagac agaaggttgt caccctcactgacaccacca 3180 accagaagac tgagctgcag gccatttacc tcgctttgca ggactcgggcctggaggtga 3240 acatcgtgac agactctcag tatgccctgg gcatcattca agcccagccagaccagagtg 3300 agtccgagct ggtcaatcag atcatcgagc agctgatcaa gaaggaaaaggtctatctgg 3360 cctgggtacc cgcccacaaa ggcattggcg gcaatgagca ggtcgacaagctggtctcgg 3420 ctggcatcag gaaggtgcta ttcctggatg gcatcgacaa ggcccaggacgagcacgaga 3480 aataccacag caactggcgg gccatggcta gcgacttcaa cctgccccctgtggtggcca 3540 aagagatcgt ggccagctgt gacaagtgtc agctcaaggg cgaagccatgcatggccagg 3600 tggactgtag ccccggcatc tggcaactcg attgcaccca tctggagggcaaggttatcc 3660 tggtagccgt ccatgtggcc agtggctaca tcgaggccga ggtcattcccgccgaaacag 3720 ggcaggagac agcctacttc ctcctgaagc tggcaggccg gtggccagtgaagaccatcc 3780 atactgacaa tggcagcaat ttcaccagtg ctacggttaa ggccgcctgctggtgggcgg 3840 gaatcaagca ggagttcggg atcccctaca atccccagag tcagggcgtcgtcgagtcta 3900 tgaataagga gttaaagaag attatcggcc aggtcagaga tcaggctgagcatctcaaga 3960 ccgcggtcca aatggcggta ttcatccaca atttcaagcg gaagggggggattggggggt 4020 acagtgcggg ggagcggatc gtggacatca tcgcgaccga catccagactaaggagctgc 4080 aaaagcagat taccaagatt cagaatttcc gggtctacta cagggacagcagaaatcccc 4140 tctggaaagg cccagcgaag ctcctctgga agggtgaggg ggcagtagtgatccaggata 4200 atagcgacat caaggtggtg cccagaagaa aggcgaagat cattagggattatggcaaac 4260 agatggcggg tgatgattgc gtggcgagca gacaggatga ggattag 43073 2571 DNA Human immunodeficiency virus type 1 3 atgagagtga aggggatcaggaggaattat cagcactggt ggggatgggg cacgatgctc 60 cttgggttat taatgatctgtagtgctaca gaaaaattgt gggtcacagt ctattatggg 120 gtacctgtgt ggaaagaagcaaccaccact ctattttgtg catcagatgc taaagcatat 180 gatacagagg tacataatgtttgggccaca caagcctgtg tacccacaga ccccaaccca 240 caagaagtag aattggtaaatgtgacagaa aattttaaca tgtggaaaaa taacatggta 300 gaacagatgc atgaggatataatcagttta tgggatcaaa gcctaaagcc atgtgtaaaa 360 ttaaccccac tctgtgttactttaaattgc actgatttga ggaatactac taataccaat 420 aatagtactg ctaataacaatagtaatagc gagggaacaa taaagggagg agaaatgaaa 480 aactgctctt tcaatatcaccacaagcata agagataaga tgcagaaaga atatgcactt 540 ctttataaac ttgatatagtatcaatagat aatgatagta ccagctatag gttgataagt 600 tgtaatacct cagtcattacacaagcttgt ccaaagatat cctttgagcc aattcccata 660 cactattgtg ccccggctggttttgcgatt ctaaaatgta acgataaaaa gttcagtgga 720 aaaggatcat gtaaaaatgtcagcacagta caatgtacac atggaattag gccagtagta 780 tcaactcaac tgctgttaaatggcagtcta gcagaagaag aggtagtaat tagatctgag 840 aatttcactg ataatgctaaaaccatcata gtacatctga atgaatctgt acaaattaat 900 tgtacaagac ccaactacaataaaagaaaa aggatacata taggaccagg gagagcattt 960 tatacaacaa aaaatataataggaactata agacaagcac attgtaacat tagtagagca 1020 aaatggaatg acactttaagacagatagtt agcaaattaa aagaacaatt taagaataaa 1080 acaatagtct ttaatcaatcctcaggaggg gacccagaaa ttgtaatgca cagttttaat 1140 tgtggagggg aatttttctactgtaataca tcaccactgt ttaatagtac ttggaatggt 1200 aataatactt ggaataatactacagggtca aataacaata tcacacttca atgcaaaata 1260 aaacaaatta taaacatgtggcaggaagta ggaaaagcaa tgtatgcccc tcccattgaa 1320 ggacaaatta gatgttcatcaaatattaca gggctactat taacaagaga tggtggtaag 1380 gacacggaca cgaacgacaccgagatcttc agacctggag gaggagatat gagggacaat 1440 tggagaagtg aattatataaatataaagta gtaacaattg aaccattagg agtagcaccc 1500 accaaggcaa agagaagagtggtgcagaga gaaaaaagag cagcgatagg agctctgttc 1560 cttgggttct taggagcagcaggaagcact atgggcgcag cgtcagtgac gctgacggta 1620 caggccagac tattattgtctggtatagtg caacagcaga acaatttgct gagggccatt 1680 gaggcgcaac agcatatgttgcaactcaca gtctggggca tcaagcagct ccaggcaaga 1740 gtcctggctg tggaaagatacctaaaggat caacagctcc tggggttttg gggttgctct 1800 ggaaaactca tttgcaccactactgtgcct tggaatgcta gttggagtaa taaatctctg 1860 gatgatattt ggaataacatgacctggatg cagtgggaaa gagaaattga caattacaca 1920 agcttaatat actcattactagaaaaatcg caaacccaac aagaaaagaa tgaacaagaa 1980 ttattggaat tggataaatgggcaagtttg tggaattggt ttgacataac aaattggctg 2040 tggtatataa aaatattcataatgatagta ggaggcttgg taggtttaag aatagttttt 2100 gctgtacttt ctatagtgaatagagttagg cagggatact caccattgtc gttgcagacc 2160 cgccccccag ttccgaggggacccgacagg cccgaaggaa tcgaagaaga aggtggagag 2220 agagacagag acacatccggtcgattagtg catggattct tagcaattat ctgggtcgac 2280 ctgcggagcc tgttcctcttcagctaccac cacagagact tactcttgat tgcagcgagg 2340 attgtggaac ttctgggacgcagggggtgg gaagtcctca aatattggtg gaatctccta 2400 cagtattgga gtcaggaactaaagagtagt gctgttagct tgcttaatgc cacagctata 2460 gcagtagctg aggggacagatagggttata gaagtactgc aaagagctgg tagagctatt 2520 ctccacatac ctacaagaataagacagggc ttggaaaggg ctttgctata a 2571 4 2571 DNA Artificial SequenceDescription of Artificial Sequence SYNgp-160mn-codon optimised envsequence 4 atgagggtga aggggatccg ccgcaactac cagcactggt ggggctggggcacgatgctc 60 ctggggctgc tgatgatctg cagcgccacc gagaagctgt gggtgaccgtgtactacggc 120 gtgcccgtgt ggaaggaggc caccaccacc ctgttctgcg ccagcgacgccaaggcgtac 180 gacaccgagg tgcacaacgt gtgggccacc caggcgtgcg tgcccaccgaccccaacccc 240 caggaggtgg agctcgtgaa cgtgaccgag aacttcaaca tgtggaagaacaacatggtg 300 gagcagatgc atgaggacat catcagcctg tgggaccaga gcctgaagccctgcgtgaag 360 ctgacccccc tgtgcgtgac cctgaactgc accgacctga ggaacaccaccaacaccaac 420 aacagcaccg ccaacaacaa cagcaacagc gagggcacca tcaagggcggcgagatgaag 480 aactgcagct tcaacatcac caccagcatc cgcgacaaga tgcagaaggagtacgccctg 540 ctgtacaagc tggatatcgt gagcatcgac aacgacagca ccagctaccgcctgatctcc 600 tgcaacacca gcgtgatcac ccaggcctgc cccaagatca gcttcgagcccatccccatc 660 cactactgcg cccccgccgg cttcgccatc ctgaagtgca acgacaagaagttcagcggc 720 aagggcagct gcaagaacgt gagcaccgtg cagtgcaccc acggcatccggccggtggtg 780 agcacccagc tcctgctgaa cggcagcctg gccgaggagg aggtggtgatccgcagcgag 840 aacttcaccg acaacgccaa gaccatcatc gtgcacctga atgagagcgtgcagatcaac 900 tgcacgcgtc ccaactacaa caagcgcaag cgcatccaca tcggccccgggcgcgccttc 960 tacaccacca agaacatcat cggcaccatc cgccaggccc actgcaacatctctagagcc 1020 aagtggaacg acaccctgcg ccagatcgtg agcaagctga aggagcagttcaagaacaag 1080 accatcgtgt tcaaccagag cagcggcggc gaccccgaga tcgtgatgcacagcttcaac 1140 tgcggcggcg aattcttcta ctgcaacacc agccccctgt tcaacagcacctggaacggc 1200 aacaacacct ggaacaacac caccggcagc aacaacaata ttaccctccagtgcaagatc 1260 aagcagatca tcaacatgtg gcaggaggtg ggcaaggcca tgtacgccccccccatcgag 1320 ggccagatcc ggtgcagcag caacatcacc ggtctgctgc tgacccgcgacggcggcaag 1380 gacaccgaca ccaacgacac cgaaatcttc cgccccggcg gcggcgacatgcgcgacaac 1440 tggagatctg agctgtacaa gtacaaggtg gtgacgatcg agcccctgggcgtggccccc 1500 accaaggcca agcgccgcgt ggtgcagcgc gagaagcggg ccgccatcggcgccctgttc 1560 ctgggcttcc tgggggcggc gggcagcacc atgggggccg ccagcgtgaccctgaccgtg 1620 caggcccgcc tgctcctgag cggcatcgtg cagcagcaga acaacctcctccgcgccatc 1680 gaggcccagc agcatatgct ccagctcacc gtgtggggca tcaagcagctccaggcccgc 1740 gtgctggccg tggagcgcta cctgaaggac cagcagctcc tgggcttctggggctgctcc 1800 ggcaagctga tctgcaccac cacggtaccc tggaacgcct cctggagcaacaagagcctg 1860 gacgacatct ggaacaacat gacctggatg cagtgggagc gcgagatcgataactacacc 1920 agcctgatct acagcctgct ggagaagagc cagacccagc aggagaagaacgagcaggag 1980 ctgctggagc tggacaagtg ggcgagcctg tggaactggt tcgacatcaccaactggctg 2040 tggtacatca aaatcttcat catgattgtg ggcggcctgg tgggcctccgcatcgtgttc 2100 gccgtgctga gcatcgtgaa ccgcgtgcgc cagggctaca gccccctgagcctccagacc 2160 cggccccccg tgccgcgcgg gcccgaccgc cccgagggca tcgaggaggagggcggcgag 2220 cgcgaccgcg acaccagcgg caggctcgtg cacggcttcc tggcgatcatctgggtcgac 2280 ctccgcagcc tgttcctgtt cagctaccac caccgcgacc tgctgctgatcgccgcccgc 2340 atcgtggaac tcctaggccg ccgcggctgg gaggtgctga agtactggtggaacctcctc 2400 cagtattgga gccaggagct gaagtccagc gccgtgagcc tgctgaacgccaccgccatc 2460 gccgtggccg agggcaccga ccgcgtgatc gaggtgctcc agagggccgggagggcgatc 2520 ctgcacatcc ccacccgcat ccgccagggg ctcgagaggg cgctgctgta a2571 5 116 DNA Artificial Sequence Description of Artificial SequenceSynthetic oligonucleotide 5 tcgagcccgg ggatgacgtc atcgacttcg aaggttcgaatccttctact gccaccattt 60 tttctctacg tcatcgactt cgaaggttcg aatccttccctgtccaccag tcgacc 116 6 110 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 6 tcgagtatta cgtcatcgacttcgaaggtt cgaatccttc tagattcacc attttttagg 60 aacgtcatcg acttcgaaggttcgaatcct tccagttcca ccagtcgacc 110 7 110 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 7tcgaggccaa cgtcatcgac ttcgaaggtt cgaatccttc tcttcccacc attttttttc 60cacgtcatcg acttcgaagg ttcgaatcct tcggggccca ccagtcgacc 110 8 110 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 8 tcgagggcta cgtcatcgac ttcgaaggtt cgaatccttc ttgcttcaccattttttctg 60 aacgtcatcg acttcgaagg ttcgaatcct tctgctgtca ccagtcgacc 1109 110 DNA Artificial Sequence Description of Artificial SequenceSynthetic oligonucleotide 9 tcgagtataa cgtcatcgac ttcgaaggtt cgaatccttcaccggtcacc atttttttat 60 aacgtcatcg acttcgaagg ttcgaatcct tcttcttacaccagtcgacc 110 10 116 DNA Artificial Sequence Description of ArtificialSequence Synthetic oligonucleotide 10 tcgaggtaca cgtcatcgac ttcgaaggttcgaatccttc gtagttcacc attttttgtg 60 cacgtcatcg acttcgaagg ttcgaatccttctaggccca ccagtcgacg catgcc 116 11 8560 DNA Artificial SequenceDescription of Artificial Sequence Synthetic nucleotide pH4DOZENEGSsequence 11 ctgacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacgcgcagcgtga 60 ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc tttcttcccttcctttctcg 120 ccacgttcgc cggctttccc cgtcaagctc taaatcgggg gctccctttagggttccgat 180 ttagtgcttt acggcacctc gaccccaaaa aacttgatta gggtgatggttcacgtagtg 240 ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacgttctttaata 300 gtggactctt gttccaaact ggaacaacac tcaaccctat ctcggtctattcttttgatt 360 tataagggat tttgccgatt tcggcctatt ggttaaaaaa tgagctgatttaacaaaaat 420 ttaacgcgaa ttttaacaaa atattaacgc ttacaatttc cattcgccattcaggctgcg 480 caactgttgg gaagggcgat cggtgcgggc ctcttcgcta ttacgccagctggcgaaagg 540 gggatgtgct gcaaggcgat taagttgggt aacgccaggg ttttcccagtcacgacgttg 600 taaaacgacg gccagtgagc gcgcgtaata cgactcacta tagggcgaattggagctcca 660 ccgcggtggc ggccgctcta gagtccgtta cataacttac ggtaaatggcccgcctggct 720 gaccgcccaa cgacccccgc ccattgacgt caataatgac gtatgttcccatagtaacgc 780 caatagggac tttccattga cgtcaatggg tggagtattt acggtaaactgcccacttgg 840 cagtacatca agtgtatcat atgccaagta cgccccctat tgacgtcaatgacggtaaat 900 ggcccgcctg gcattatgcc cagtacatga ccttatggga ctttcctacttggcagtaca 960 tctacgtatt agtcatcgct attaccatgg tgatgcggtt ttggcagtacatcaatgggc 1020 gtggatagcg gtttgactca cggggatttc caagtctcca ccccattgacgtcaatggga 1080 gtttgttttg gcaccaaaat caacgggact ttccaaaatg tcgtaacaactccgccccat 1140 tgacgcaaat gggcggtagg cgtgtacggt gggaggtcta tataagcagagctcgtttag 1200 tgaaccggtc tctctggtta gaccagatct gagcctggga gctctctggctaactaggga 1260 acccactgct taagcctcaa taaagcttgc cttgagtgct tcaagtagtgtgtgcccgtc 1320 tgttgtgtga ctctggtaac tagagatccc tcagaccctt ttagtcagtgtggaaaatct 1380 ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg aaaccagaggagctctctcg 1440 acgcaggact cggcttgctg aagcgcgcac ggcaagaggc gaggggcggcgactggtgag 1500 tacgccaaaa attttgacta gcggaggcta gaaggagaga gatgggtgcgagagcgtcag 1560 tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt cggttaaggccagggggaaa 1620 gaaaaaatat aaattaaaac atatagtatg ggcaagcagg gagctagaacgattcgcagt 1680 taatcctggc ctgttagaaa catcagaagg ctgtagacaa atactgggacagctacaacc 1740 atcccttcag acaggatcag aagaacttag atcattatat aatacagtagcaaccctcta 1800 ttgtgtgcat caaaggttga gataaaagac accaaggaag ctttagacaagatagaggga 1860 gagcaaaaca aaagtaagaa aaaagcacag caagcagcag ctgacacaggacacagcaat 1920 caggtcagcc aaaattaccc tatagtgcag aacatccagg ggcaaatggtacatcaggcc 1980 atatcaccta gaactttaaa tgcatgggta aaagtagtag aagagaaggctttcagccca 2040 gaagtgatac ccatgttttc agcattatca gaaggagcca ccccacaagatttaaacacc 2100 atgctaaaca cagtgggggg acatcaagca gccatgcaaa tgttaaaagagaccatcaat 2160 gaggaagctg caggaattcg cctaaaactg cttgtaccaa ttgctattgtaaaaagtgtt 2220 gctttcattg ccaagtttgt ttcataacaa aagccttagg catctcctatggcaggaaga 2280 agcggagaca gcgacgaaga gctcatcaga acagtcagac tcatcaagcttctctatcaa 2340 agcagtaagt agtacatgta acgcaaccta taccaatagt agcaatagtagcattagtag 2400 tagcaataat aatagcaata gttgtgtggt ccatagtaat catagaatataggaaaatat 2460 taagacaaag aaaaatagac aggttaattg atagactaat agaaagagcagaagacagtg 2520 gcaatgagag tgaaggagaa atatcagcac ttgtggagat gggggtggagatggggcacc 2580 atgctccttg ggatgttgat gatctgtagt gctacagaaa aattgtgggtcacagtctat 2640 tatggggtac ctgtgtggaa ggaagcaacc accactctat tttgtgcatcagatgctaaa 2700 gcatagatct tcagacttgg aggaggagat atgagggaca attggagaagtgaattatat 2760 aaatataaag tagtaaaaat tgaaccatta ggagtagcac ccaccaaggcaaagagaaga 2820 gtggtgcaga gagaaaaaag agcagtggga ataggagctt tgttccttgggttcttggga 2880 gcagcaggaa gcactatggg cgcagcgtca atgacgctga cggtacaggccagacaatta 2940 ttgtctggta tagtgcagca gcagaacaat ttgctgaggg ctattgaggcgcaacagcat 3000 ctgttgcaac tcacagtctg gggcatcaag cagctccagg caagaatcctggctgtggaa 3060 agatacctaa aggatcaaca gctcctgggg atttggggtt gctctggaaaactcatttgc 3120 accactgctg tgccttggaa tgctagttgg agtaataaat ctctggaacagatctggaat 3180 cacacgacct ggatggagtg ggacagagaa attaacaatt acacaagcttaatacactcc 3240 ttaattgaag aatcgcaaaa ccagcaagaa aagaatgaac aagaattattggaattagat 3300 aaatgggcaa gtttgtggaa ttggtttaac ataacaaatt ggctgtggtatataaaatta 3360 ttcataatga tagtaggagg cttggtaggt ttaagaatag tttttgctgtactttctata 3420 gtgaatagag ttaggcaggg atattcacca ttatcgtttc agacccacctcccaaccccg 3480 aggggacccg acaggcccga aggaatagaa gaagaaggtg gagagagagacagagacaga 3540 tccattcgat tagtgaacgg atccttggca cttatctggg acgatctgcggagcctgtgc 3600 ctcttcagct accaccgctt gagagactta ctcttgattg taacgaggattgtggaactt 3660 ctgggacgca gggggtggga agccctcaaa tattggtgga atctcctacagtattggagt 3720 caggaactaa agaatagtgc tgttagcttg ctcaatgcca cagccatagcagtagctgag 3780 gggacagata gggttataga agtagtacaa ggagcttgta gagctattcgccacatacct 3840 agaagaataa gacagggctt ggaaaggatt ttgctataag atgggtggcaagtggtcaaa 3900 aagtagtgtg attggatggc ctactgtaag ggaaagaatg agacgagctgagccagcagc 3960 agatagggtg ggagcagcat ctcgacgctg caggagtggg gaggcacgatggccgctttg 4020 gtcgaggcgg atccggccat tagccatatt attcattggt tatatagcataaatcaatat 4080 tggctattgg ccattgcata cgttgtatcc atatcataat atgtacatttatattggctc 4140 atgtccaaca ttaccgccat gttgacattg attattgact agttattaatagtaatcaat 4200 tacggggtca ttagttcata gcccatatat ggagttccgc gttacataacttacggtaaa 4260 tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataatgacgtatgt 4320 tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagtatttacggta 4380 aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgccccctattgacgt 4440 caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttatgggactttcc 4500 tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgcggttttggca 4560 gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtctccaccccat 4620 tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaaaatgtcgtaa 4680 caactccgcc ccattgacgc aaatgggcgg taggcatgta cggtgggaggtctatataag 4740 cagagctcgt ttagtgaacc gtcagatcgc ctggagacgc catccacgctgttttgacct 4800 ccatagaaga caccgggacc gatccagcct ccgcggcccc aagcttcagctgctcgagcc 4860 cggggatgac gtcatcgact tcgaaggttc gaatccttct actgccaccattttttctct 4920 acgtcatcga cttcgaaggt tcgaatcctt ccctgtccac cagtcgagtattacgtcatc 4980 gacttcgaag gttcgaatcc ttctagattc accatttttt aggaacgtcatcgacttcga 5040 aggttcgaat ccttccagtt ccaccagtcg aggccaacgt catcgacttcgaaggttcga 5100 atccttctct tcccaccatt ttttttccac gtcatcgact tcgaaggttcgaatccttcg 5160 gggcccacca gtcgagggct acgtcatcga cttcgaaggt tcgaatccttcttgcttcac 5220 cattttttct gaacgtcatc gacttcgaag gttcgaatcc ttctgctgtcaccagtcgag 5280 tataacgtca tcgacttcga aggttcgaat ccttcaccgg tcaccatttttttataacgt 5340 catcgacttc gaaggttcga atccttcttc ttacaccagt cgaggtacacgtcatcgact 5400 tcgaaggttc gaatccttcg tagttcacca ttttttgtgc acgtcatcgacttcgaaggt 5460 tcgaatcctt ctaggcccac cagtcgacgc atgcctgcag gtcgaggtcgataccgtcga 5520 gacctagaaa aacatggagc aatcacaagt agcaatacag cagctaccaatgctgattgt 5580 gcctggctag aagcacaaga ggaggaggag gtgggttttc cagtcacacctcaggtacct 5640 ttaagaccaa tgacttacaa ggcagctgta gatcttagcc actttttaaaagaaaagggg 5700 ggactggaag ggctaattca ctcccaacga agacaagata tccttgatctgtggatctac 5760 cacacacaag gctacttccc tgattggcag aactacacac cagggccagggatcagatat 5820 ccactgacct ttggatggtg ctacaagcta gtaccagttg agcaagagaaggtagaagaa 5880 gccaatgaag gagagaacac ccgcttgtta caccctgtga gcctgcatgggatggatgac 5940 ccggagagag aagtattaga gtggaggttt gacagccgcc tagcatttcatcacatggcc 6000 cgagagctgc atccggagta cttcaagaac tgctgacatc gagcttgctacaagggactt 6060 tccgctgggg actttccagg gaggcgtggc ctgggcggga ctggggagtggcgagccctc 6120 agatgctgca tataagcagc tgctttttgc ctgtactggg tctctctggttagaccagat 6180 ctgagcctgg gagctctctg gctaactagg gaacccactg cttaagcctcaataaagctt 6240 gccttgagtg cttcaagtag tgtgtgcccg tctgttgtgt gactctggtaactagagatc 6300 cctcagaccc ttttagtcag tgtggaaaat ctctagcagt cgagggggggcccggtaccc 6360 agcttttgtt ccctttagtg agggttaatt gcgcgcttgg cgtaatcatggtcatagctg 6420 tttcctgtgt gaaattgtta tccgctcaca attccacaca acatacgagccggaagcata 6480 aagtgtaaag cctggggtgc ctaatgagtg agctaactca cattaattgcgttgcgctca 6540 ctgcccgctt tccagtcggg aaacctgtcg tgccagctgc attaatgaatcggccaacgc 6600 gcggggagag gcggtttgcg tattgggcgc tcttccgctt cctcgctcactgactcgctg 6660 cgctcggtcg ttcggctgcg gcgagcggta tcagctcact caaaggcggtaatacggtta 6720 tccacagaat caggggataa cgcaggaaag aacatgtgag caaaaggccagcaaaaggcc 6780 aggaaccgta aaaaggccgc gttgctggcg tttttccata ggctccgcccccctgacgag 6840 catcacaaaa atcgacgctc aagtcagagg tggcgaaacc cgacaggactataaagatac 6900 caggcgtttc cccctggaag ctccctcgtg cgctctcctg ttccgaccctgccgcttacc 6960 ggatacctgt ccgcctttct cccttcggga agcgtggcgc tttctcatagctcacgctgt 7020 aggtatctca gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgcacgaacccccc 7080 gttcagcccg accgctgcgc cttatccggt aactatcgtc ttgagtccaacccggtaaga 7140 cacgacttat cgccactggc agcagccact ggtaacagga ttagcagagcgaggtatgta 7200 ggcggtgcta cagagttctt gaagtggtgg cctaactacg gctacactagaaggacagta 7260 tttggtatct gcgctctgct gaagccagtt accttcggaa aaagagttggtagctcttga 7320 tccggcaaac aaaccaccgc tggtagcggt ggtttttttg tttgcaagcagcagattacg 7380 cgcagaaaaa aaggatctca agaagatcct ttgatctttt ctacggggtctgacgctcag 7440 tggaacgaaa actcacgtta agggattttg gtcatgagat tatcaaaaaggatcttcacc 7500 tagatccttt taaattaaaa atgaagtttt aaatcaatct aaagtatatatgagtaaact 7560 tggtctgaca gttaccaatg cttaatcagt gaggcaccta tctcagcgatctgtctattt 7620 cgttcatcca tagttgcctg actccccgtc gtgtagataa ctacgatacgggagggctta 7680 ccatctggcc ccagtgctgc aatgataccg cgagacccac gctcaccggctccagattta 7740 tcagcaataa accagccagc cggaagggcc gagcgcagaa gtggtcctgcaactttatcc 7800 gcctccatcc agtctattaa ttgttgccgg gaagctagag taagtagttcgccagttaat 7860 agtttgcgca acgttgttgc cattgctaca ggcatcgtgg tgtcacgctcgtcgtttggt 7920 atggcttcat tcagctccgg ttcccaacga tcaaggcgag ttacatgatcccccatgttg 7980 tgcaaaaaag cggttagctc cttcggtcct ccgatcgttg tcagaagtaagttggccgca 8040 gtgttatcac tcatggttat ggcagcactg cataattctc ttactgtcatgccatccgta 8100 agatgctttt ctgtgactgg tgagtactca accaagtcat tctgagaatagtgtatgcgg 8160 cgaccgagtt gctcttgccc ggcgtcaata cgggataata ccgcgccacatagcagaact 8220 ttaaaagtgc tcatcattgg aaaacgttct tcggggcgaa aactctcaaggatcttaccg 8280 ctgttgagat ccagttcgat gtaacccact cgtgcaccca actgatcttcagcatctttt 8340 actttcacca gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgcaaaaaaggga 8400 ataagggcga cacggaaatg ttgaatactc atactcttcc tttttcaatattattgaagc 8460 atttatcagg gttattgtct catgagcgga tacatatttg aatgtatttagaaaaataaa 8520 caaatagggg ttccgcgcac atttccccga aaagtgccac 8560 12 4642DNA Artificial Sequence Description of Artificial Sequence pSYNGP2-codon optimised HIV-1 gagpol with leader sequence 12 gggtctctctggttagacca gatctgagcc tgggagctct ctggctaact agggaaccca 60 ctgcttaagcctcaataaag cttgccttga gtgcttcaag tagtgtgtgc ccgtctgttg 120 tgtgactctggtaactagag atccctcaga cccttttagt cagtgtggaa aatctctagc 180 agtggcgcccgaacagggac ctgaaagcga aagggaaacc agagctctct cgacgcagga 240 ctcggcttgctgaagcgccc gcacggcaag aggcgagggg cggcgactgg tgagtacgcc 300 aaaaattttgactagcggag gctagaagga gagagatggg cgcccgcgcc agcgtgctgt 360 cgggcggcgagctggaccgc tgggagaaga tccgcctgcg ccccggcggc aaaaagaagt 420 acaagctgaagcacatcgtg tgggccagcc gcgaactgga gcgcttcgcc gtgaaccccg 480 ggctcctggagaccagcgag gggtgccgcc agatcctcgg ccaactgcag cccagcctgc 540 aaaccggcagcgaggagctg cgcagcctgt acaacaccgt ggccacgctg tactgcgtcc 600 accagcgcatcgaaatcaag gatacgaaag aggccctgga taaaatcgaa gaggaacaga 660 ataagagcaaaaagaaggcc caacaggccg ccgcggacac cggacacagc aaccaggtca 720 gccagaactaccccatcgtg cagaacatcc aggggcagat ggtgcaccag gccatctccc 780 cccgcacgctgaacgcctgg gtgaaggtgg tggaagagaa ggcttttagc ccggaggtga 840 tacccatgttctcagccctg tcagagggag ccacccccca agatctgaac accatgctca 900 acacagtggggggacaccag gccgccatgc agatgctgaa ggagaccatc aatgaggagg 960 ctgccgaatgggatcgtgtg catccggtgc acgcagggcc catcgcaccg ggccagatgc 1020 gtgagccacggggctcagac atcgccggaa cgactagtac ccttcaggaa cagatcggct 1080 ggatgaccaacaacccaccc atcccggtgg gagaaatcta caaacgctgg atcatcctgg 1140 gcctgaacaagatcgtgcgc atgtatagcc ctaccagcat cctggacatc cgccaaggcc 1200 cgaaggaaccctttcgcgac tacgtggacc ggttctacaa aacgctccgc gccgagcagg 1260 ctagccaggaggtgaagaac tggatgaccg aaaccctgct ggtccagaac gcgaacccgg 1320 actgcaagacgatcctgaag gccctgggcc cagcggctac cctagaggaa atgatgaccg 1380 cctgtcagggagtgggcgga cccggccaca aggcacgcgt cctggctgag gccatgagcc 1440 aggtgaccaactccgctacc atcatgatgc agcgcggcaa ctttcggaac caacgcaaga 1500 tcgtcaagtgcttcaactgt ggcaaagaag ggcacacagc ccgcaactgc agggccccta 1560 ggaaaaagggctgttggaaa tgtggaaagg aaggacacca aatgaaagat tgtactgaga 1620 gacaggctaattttttaggg aagatctggc cttcccacaa gggaaggcca gggaattttc 1680 ttcagagcagaccagagcca acagccccac cagaagagag cttcaggttt ggggaagaga 1740 caacaactccctctcagaag caggagccga tagacaagga actgtatcct ttagcttccc 1800 tcagatcactctttggcagc gacccctcgt cacaataaag ataggggggc agctcaagga 1860 ggctctcctggacaccggag cagacgacac cgtgctggag gagatgtcgt tgccaggccg 1920 ctggaagccgaagatgatcg ggggaatcgg cggtttcatc aaggtgcgcc agtatgacca 1980 gatcctcatcgaaatctgcg gccacaaggc tatcggtacc gtgctggtgg gccccacacc 2040 cgtcaacatcatcggacgca acctgttgac gcagatcggt tgcacgctga acttccccat 2100 tagccctatcgagacggtac cggtgaagct gaagcccggg atggacggcc cgaaggtcaa 2160 gcaatggccattgacagagg agaagatcaa ggcactggtg gagatttgca cagagatgga 2220 aaaggaagggaaaatctcca agattgggcc tgagaacccg tacaacacgc cggtgttcgc 2280 aatcaagaagaaggactcga cgaaatggcg caagctggtg gacttccgcg agctgaacaa 2340 gcgcacgcaagacttctggg aggttcagct gggcatcccg caccccgcag ggctgaagaa 2400 gaagaaatccgtgaccgtac tggatgtggg tgatgcctac ttctccgttc ccctggacga 2460 agacttcaggaagtacactg ccttcacaat cccttcgatc aacaacgaga caccggggat 2520 tcgatatcagtacaacgtgc tgccccaggg ctggaaaggc tctcccgcaa tcttccagag 2580 tagcatgaccaaaatcctgg agcctttccg caaacagaac cccgacatcg tcatctatca 2640 gtacatggatgacttgtacg tgggctctga tctagagata gggcagcacc gcaccaagat 2700 cgaggagctgcgccagcacc tgttgaggtg gggactgacc acacccgaca agaagcacca 2760 gaaggagcctcccttcctct ggatgggtta cgagctgcac cctgacaaat ggaccgtgca 2820 gcctatcgtgctgccagaga aagacagctg gactgtcaac gacatacaga agctggtggg 2880 gaagttgaactgggccagtc agatttaccc agggattaag gtgaggcagc tgtgcaaact 2940 cctccgcggaaccaaggcac tcacagaggt gatcccccta accgaggagg ccgagctcga 3000 actggcagaaaaccgagaga tcctaaagga gcccgtgcac ggcgtgtact atgacccctc 3060 caaggacctgatcgccgaga tccagaagca ggggcaaggc cagtggacct atcagattta 3120 ccaggagcccttcaagaacc tgaagaccgg caagtacgcc cggatgaggg gtgcccacac 3180 taacgacgtcaagcagctga ccgaggccgt gcagaagatc accaccgaaa gcatcgtgat 3240 ctggggaaagactcctaagt tcaagctgcc catccagaag gaaacctggg aaacctggtg 3300 gacagagtattggcaggcca cctggattcc tgagtgggag ttcgtcaaca cccctcccct 3360 ggtgaagctgtggtaccagc tggagaagga gcccatagtg ggcgccgaaa ccttctacgt 3420 ggatggggccgctaacaggg agactaagct gggcaaagcc ggatacgtca ctaaccgggg 3480 cagacagaaggttgtcaccc tcactgacac caccaaccag aagactgagc tgcaggccat 3540 ttacctcgctttgcaggact cgggcctgga ggtgaacatc gtgacagact ctcagtatgc 3600 cctgggcatcattcaagccc agccagacca gagtgagtcc gagctggtca atcagatcat 3660 cgagcagctgatcaagaagg aaaaggtcta tctggcctgg gtacccgccc acaaaggcat 3720 tggcggcaatgagcaggtcg acaagctggt ctcggctggc atcaggaagg tgctattcct 3780 ggatggcatcgacaaggccc aggacgagca cgagaaatac cacagcaact ggcgggccat 3840 ggctagcgacttcaacctgc cccctgtggt ggccaaagag atcgtggcca gctgtgacaa 3900 gtgtcagctcaagggcgaag ccatgcatgg ccaggtggac tgtagccccg gcatctggca 3960 actcgattgcacccatctgg agggcaaggt tatcctggta gccgtccatg tggccagtgg 4020 ctacatcgaggccgaggtca ttcccgccga aacagggcag gagacagcct acttcctcct 4080 gaagctggcaggccggtggc cagtgaagac catccatact gacaatggca gcaatttcac 4140 cagtgctacggttaaggccg cctgctggtg ggcgggaatc aagcaggagt tcgggatccc 4200 ctacaatccccagagtcagg gcgtcgtcga gtctatgaat aaggagttaa agaagattat 4260 cggccaggtcagagatcagg ctgagcatct caagaccgcg gtccaaatgg cggtattcat 4320 ccacaatttcaagcggaagg gggggattgg ggggtacagt gcgggggagc ggatcgtgga 4380 catcatcgcgaccgacatcc agactaagga gctgcaaaag cagattacca agattcagaa 4440 tttccgggtctactacaggg acagcagaaa tcccctctgg aaaggcccag cgaagctcct 4500 ctggaagggtgagggggcag tagtgatcca ggataatagc gacatcaagg tggtgcccag 4560 aagaaaggcgaagatcatta gggattatgg caaacagatg gcgggtgatg attgcgtggc 4620 gagcagacaggatgaggatt ag 4642 13 4353 DNA Artificial Sequence Description ofArtificial Sequence pSYNGP3- codon optimised HIV-1 gagpol with leadersequence from the major splice donor 13 gtgagtacgc caaaaatttt gactagcggaggctagaagg agagagatgg gcgcccgcgc 60 cagcgtgctg tcgggcggcg agctggaccgctgggagaag atccgcctgc gccccggcgg 120 caaaaagaag tacaagctga agcacatcgtgtgggccagc cgcgaactgg agcgcttcgc 180 cgtgaacccc gggctcctgg agaccagcgaggggtgccgc cagatcctcg gccaactgca 240 gcccagcctg caaaccggca gcgaggagctgcgcagcctg tacaacaccg tggccacgct 300 gtactgcgtc caccagcgca tcgaaatcaaggatacgaaa gaggccctgg ataaaatcga 360 agaggaacag aataagagca aaaagaaggcccaacaggcc gccgcggaca ccggacacag 420 caaccaggtc agccagaact accccatcgtgcagaacatc caggggcaga tggtgcacca 480 ggccatctcc ccccgcacgc tgaacgcctgggtgaaggtg gtggaagaga aggcttttag 540 cccggaggtg atacccatgt tctcagccctgtcagaggga gccacccccc aagatctgaa 600 caccatgctc aacacagtgg ggggacaccaggccgccatg cagatgctga aggagaccat 660 caatgaggag gctgccgaat gggatcgtgtgcatccggtg cacgcagggc ccatcgcacc 720 gggccagatg cgtgagccac ggggctcagacatcgccgga acgactagta cccttcagga 780 acagatcggc tggatgacca acaacccacccatcccggtg ggagaaatct acaaacgctg 840 gatcatcctg ggcctgaaca agatcgtgcgcatgtatagc cctaccagca tcctggacat 900 ccgccaaggc ccgaaggaac cctttcgcgactacgtggac cggttctaca aaacgctccg 960 cgccgagcag gctagccagg aggtgaagaactggatgacc gaaaccctgc tggtccagaa 1020 cgcgaacccg gactgcaaga cgatcctgaaggccctgggc ccagcggcta ccctagagga 1080 aatgatgacc gcctgtcagg gagtgggcggacccggccac aaggcacgcg tcctggctga 1140 ggccatgagc caggtgacca actccgctaccatcatgatg cagcgcggca actttcggaa 1200 ccaacgcaag atcgtcaagt gcttcaactgtggcaaagaa gggcacacag cccgcaactg 1260 cagggcccct aggaaaaagg gctgttggaaatgtggaaag gaaggacacc aaatgaaaga 1320 ttgtactgag agacaggcta attttttagggaagatctgg ccttcccaca agggaaggcc 1380 agggaatttt cttcagagca gaccagagccaacagcccca ccagaagaga gcttcaggtt 1440 tggggaagag acaacaactc cctctcagaagcaggagccg atagacaagg aactgtatcc 1500 tttagcttcc ctcagatcac tctttggcagcgacccctcg tcacaataaa gatagggggg 1560 cagctcaagg aggctctcct ggacaccggagcagacgaca ccgtgctgga ggagatgtcg 1620 ttgccaggcc gctggaagcc gaagatgatcgggggaatcg gcggtttcat caaggtgcgc 1680 cagtatgacc agatcctcat cgaaatctgcggccacaagg ctatcggtac cgtgctggtg 1740 ggccccacac ccgtcaacat catcggacgcaacctgttga cgcagatcgg ttgcacgctg 1800 aacttcccca ttagccctat cgagacggtaccggtgaagc tgaagcccgg gatggacggc 1860 ccgaaggtca agcaatggcc attgacagaggagaagatca aggcactggt ggagatttgc 1920 acagagatgg aaaaggaagg gaaaatctccaagattgggc ctgagaaccc gtacaacacg 1980 ccggtgttcg caatcaagaa gaaggactcgacgaaatggc gcaagctggt ggacttccgc 2040 gagctgaaca agcgcacgca agacttctgggaggttcagc tgggcatccc gcaccccgca 2100 gggctgaaga agaagaaatc cgtgaccgtactggatgtgg gtgatgccta cttctccgtt 2160 cccctggacg aagacttcag gaagtacactgccttcacaa tcccttcgat caacaacgag 2220 acaccgggga ttcgatatca gtacaacgtgctgccccagg gctggaaagg ctctcccgca 2280 atcttccaga gtagcatgac caaaatcctggagcctttcc gcaaacagaa ccccgacatc 2340 gtcatctatc agtacatgga tgacttgtacgtgggctctg atctagagat agggcagcac 2400 cgcaccaaga tcgaggagct gcgccagcacctgttgaggt ggggactgac cacacccgac 2460 aagaagcacc agaaggagcc tcccttcctctggatgggtt acgagctgca ccctgacaaa 2520 tggaccgtgc agcctatcgt gctgccagagaaagacagct ggactgtcaa cgacatacag 2580 aagctggtgg ggaagttgaa ctgggccagtcagatttacc cagggattaa ggtgaggcag 2640 ctgtgcaaac tcctccgcgg aaccaaggcactcacagagg tgatccccct aaccgaggag 2700 gccgagctcg aactggcaga aaaccgagagatcctaaagg agcccgtgca cggcgtgtac 2760 tatgacccct ccaaggacct gatcgccgagatccagaagc aggggcaagg ccagtggacc 2820 tatcagattt accaggagcc cttcaagaacctgaagaccg gcaagtacgc ccggatgagg 2880 ggtgcccaca ctaacgacgt caagcagctgaccgaggccg tgcagaagat caccaccgaa 2940 agcatcgtga tctggggaaa gactcctaagttcaagctgc ccatccagaa ggaaacctgg 3000 gaaacctggt ggacagagta ttggcaggccacctggattc ctgagtggga gttcgtcaac 3060 acccctcccc tggtgaagct gtggtaccagctggagaagg agcccatagt gggcgccgaa 3120 accttctacg tggatggggc cgctaacagggagactaagc tgggcaaagc cggatacgtc 3180 actaaccggg gcagacagaa ggttgtcaccctcactgaca ccaccaacca gaagactgag 3240 ctgcaggcca tttacctcgc tttgcaggactcgggcctgg aggtgaacat cgtgacagac 3300 tctcagtatg ccctgggcat cattcaagcccagccagacc agagtgagtc cgagctggtc 3360 aatcagatca tcgagcagct gatcaagaaggaaaaggtct atctggcctg ggtacccgcc 3420 cacaaaggca ttggcggcaa tgagcaggtcgacaagctgg tctcggctgg catcaggaag 3480 gtgctattcc tggatggcat cgacaaggcccaggacgagc acgagaaata ccacagcaac 3540 tggcgggcca tggctagcga cttcaacctgccccctgtgg tggccaaaga gatcgtggcc 3600 agctgtgaca agtgtcagct caagggcgaagccatgcatg gccaggtgga ctgtagcccc 3660 ggcatctggc aactcgattg cacccatctggagggcaagg ttatcctggt agccgtccat 3720 gtggccagtg gctacatcga ggccgaggtcattcccgccg aaacagggca ggagacagcc 3780 tacttcctcc tgaagctggc aggccggtggccagtgaaga ccatccatac tgacaatggc 3840 agcaatttca ccagtgctac ggttaaggccgcctgctggt gggcgggaat caagcaggag 3900 ttcgggatcc cctacaatcc ccagagtcagggcgtcgtcg agtctatgaa taaggagtta 3960 aagaagatta tcggccaggt cagagatcaggctgagcatc tcaagaccgc ggtccaaatg 4020 gcggtattca tccacaattt caagcggaagggggggattg gggggtacag tgcgggggag 4080 cggatcgtgg acatcatcgc gaccgacatccagactaagg agctgcaaaa gcagattacc 4140 aagattcaga atttccgggt ctactacagggacagcagaa atcccctctg gaaaggccca 4200 gcgaagctcc tctggaaggg tgagggggcagtagtgatcc aggataatag cgacatcaag 4260 gtggtgccca gaagaaaggc gaagatcattagggattatg gcaaacagat ggcgggtgat 4320 gattgcgtgg cgagcagaca ggatgaggattag 4353 14 4327 DNA Artificial Sequence Description of ArtificialSequence pSYNGP4- codon optimised HIV-1 gagpol with 20bp of the leadersequence of HIV-1 14 cggaggctag aaggagagag atgggcgccc gcgccagcgtgctgtcgggc ggcgagctgg 60 accgctggga gaagatccgc ctgcgccccg gcggcaaaaagaagtacaag ctgaagcaca 120 tcgtgtgggc cagccgcgaa ctggagcgct tcgccgtgaaccccgggctc ctggagacca 180 gcgaggggtg ccgccagatc ctcggccaac tgcagcccagcctgcaaacc ggcagcgagg 240 agctgcgcag cctgtacaac accgtggcca cgctgtactgcgtccaccag cgcatcgaaa 300 tcaaggatac gaaagaggcc ctggataaaa tcgaagaggaacagaataag agcaaaaaga 360 aggcccaaca ggccgccgcg gacaccggac acagcaaccaggtcagccag aactacccca 420 tcgtgcagaa catccagggg cagatggtgc accaggccatctccccccgc acgctgaacg 480 cctgggtgaa ggtggtggaa gagaaggctt ttagcccggaggtgataccc atgttctcag 540 ccctgtcaga gggagccacc ccccaagatc tgaacaccatgctcaacaca gtggggggac 600 accaggccgc catgcagatg ctgaaggaga ccatcaatgaggaggctgcc gaatgggatc 660 gtgtgcatcc ggtgcacgca gggcccatcg caccgggccagatgcgtgag ccacggggct 720 cagacatcgc cggaacgact agtacccttc aggaacagatcggctggatg accaacaacc 780 cacccatccc ggtgggagaa atctacaaac gctggatcatcctgggcctg aacaagatcg 840 tgcgcatgta tagccctacc agcatcctgg acatccgccaaggcccgaag gaaccctttc 900 gcgactacgt ggaccggttc tacaaaacgc tccgcgccgagcaggctagc caggaggtga 960 agaactggat gaccgaaacc ctgctggtcc agaacgcgaacccggactgc aagacgatcc 1020 tgaaggccct gggcccagcg gctaccctag aggaaatgatgaccgcctgt cagggagtgg 1080 gcggacccgg ccacaaggca cgcgtcctgg ctgaggccatgagccaggtg accaactccg 1140 ctaccatcat gatgcagcgc ggcaactttc ggaaccaacgcaagatcgtc aagtgcttca 1200 actgtggcaa agaagggcac acagcccgca actgcagggcccctaggaaa aagggctgtt 1260 ggaaatgtgg aaaggaagga caccaaatga aagattgtactgagagacag gctaattttt 1320 tagggaagat ctggccttcc cacaagggaa ggccagggaattttcttcag agcagaccag 1380 agccaacagc cccaccagaa gagagcttca ggtttggggaagagacaaca actccctctc 1440 agaagcagga gccgatagac aaggaactgt atcctttagcttccctcaga tcactctttg 1500 gcagcgaccc ctcgtcacaa taaagatagg ggggcagctcaaggaggctc tcctggacac 1560 cggagcagac gacaccgtgc tggaggagat gtcgttgccaggccgctgga agccgaagat 1620 gatcggggga atcggcggtt tcatcaaggt gcgccagtatgaccagatcc tcatcgaaat 1680 ctgcggccac aaggctatcg gtaccgtgct ggtgggccccacacccgtca acatcatcgg 1740 acgcaacctg ttgacgcaga tcggttgcac gctgaacttccccattagcc ctatcgagac 1800 ggtaccggtg aagctgaagc ccgggatgga cggcccgaaggtcaagcaat ggccattgac 1860 agaggagaag atcaaggcac tggtggagat ttgcacagagatggaaaagg aagggaaaat 1920 ctccaagatt gggcctgaga acccgtacaa cacgccggtgttcgcaatca agaagaagga 1980 ctcgacgaaa tggcgcaagc tggtggactt ccgcgagctgaacaagcgca cgcaagactt 2040 ctgggaggtt cagctgggca tcccgcaccc cgcagggctgaagaagaaga aatccgtgac 2100 cgtactggat gtgggtgatg cctacttctc cgttcccctggacgaagact tcaggaagta 2160 cactgccttc acaatccctt cgatcaacaa cgagacaccggggattcgat atcagtacaa 2220 cgtgctgccc cagggctgga aaggctctcc cgcaatcttccagagtagca tgaccaaaat 2280 cctggagcct ttccgcaaac agaaccccga catcgtcatctatcagtaca tggatgactt 2340 gtacgtgggc tctgatctag agatagggca gcaccgcaccaagatcgagg agctgcgcca 2400 gcacctgttg aggtggggac tgaccacacc cgacaagaagcaccagaagg agcctccctt 2460 cctctggatg ggttacgagc tgcaccctga caaatggaccgtgcagccta tcgtgctgcc 2520 agagaaagac agctggactg tcaacgacat acagaagctggtggggaagt tgaactgggc 2580 cagtcagatt tacccaggga ttaaggtgag gcagctgtgcaaactcctcc gcggaaccaa 2640 ggcactcaca gaggtgatcc ccctaaccga ggaggccgagctcgaactgg cagaaaaccg 2700 agagatccta aaggagcccg tgcacggcgt gtactatgacccctccaagg acctgatcgc 2760 cgagatccag aagcaggggc aaggccagtg gacctatcagatttaccagg agcccttcaa 2820 gaacctgaag accggcaagt acgcccggat gaggggtgcccacactaacg acgtcaagca 2880 gctgaccgag gccgtgcaga agatcaccac cgaaagcatcgtgatctggg gaaagactcc 2940 taagttcaag ctgcccatcc agaaggaaac ctgggaaacctggtggacag agtattggca 3000 ggccacctgg attcctgagt gggagttcgt caacacccctcccctggtga agctgtggta 3060 ccagctggag aaggagccca tagtgggcgc cgaaaccttctacgtggatg gggccgctaa 3120 cagggagact aagctgggca aagccggata cgtcactaaccggggcagac agaaggttgt 3180 caccctcact gacaccacca accagaagac tgagctgcaggccatttacc tcgctttgca 3240 ggactcgggc ctggaggtga acatcgtgac agactctcagtatgccctgg gcatcattca 3300 agcccagcca gaccagagtg agtccgagct ggtcaatcagatcatcgagc agctgatcaa 3360 gaaggaaaag gtctatctgg cctgggtacc cgcccacaaaggcattggcg gcaatgagca 3420 ggtcgacaag ctggtctcgg ctggcatcag gaaggtgctattcctggatg gcatcgacaa 3480 ggcccaggac gagcacgaga aataccacag caactggcgggccatggcta gcgacttcaa 3540 cctgccccct gtggtggcca aagagatcgt ggccagctgtgacaagtgtc agctcaaggg 3600 cgaagccatg catggccagg tggactgtag ccccggcatctggcaactcg attgcaccca 3660 tctggagggc aaggttatcc tggtagccgt ccatgtggccagtggctaca tcgaggccga 3720 ggtcattccc gccgaaacag ggcaggagac agcctacttcctcctgaagc tggcaggccg 3780 gtggccagtg aagaccatcc atactgacaa tggcagcaatttcaccagtg ctacggttaa 3840 ggccgcctgc tggtgggcgg gaatcaagca ggagttcgggatcccctaca atccccagag 3900 tcagggcgtc gtcgagtcta tgaataagga gttaaagaagattatcggcc aggtcagaga 3960 tcaggctgag catctcaaga ccgcggtcca aatggcggtattcatccaca atttcaagcg 4020 gaaggggggg attggggggt acagtgcggg ggagcggatcgtggacatca tcgcgaccga 4080 catccagact aaggagctgc aaaagcagat taccaagattcagaatttcc gggtctacta 4140 cagggacagc agaaatcccc tctggaaagg cccagcgaagctcctctgga agggtgaggg 4200 ggcagtagtg atccaggata atagcgacat caaggtggtgcccagaagaa aggcgaagat 4260 cattagggat tatggcaaac agatggcggg tgatgattgcgtggcgagca gacaggatga 4320 ggattag 4327 15 22 RNA Artificial SequenceDescription of Artificial Sequence Illustrative helix II sequence 15cugaugaggc cgaaaggccg aa 22 16 22 RNA Human immunodeficiency virus type1 16 uaguaagaau guauagcccu ac 22 17 22 RNA Human immunodeficiency virustype 1 17 aacccagauu guaagacuau uu 22 18 22 RNA Human immunodeficiencyvirus type 1 18 uguuucaauu guggcaaaga ag 22 19 22 RNA Humanimmunodeficiency virus type 1 19 aaaaagggcu guuggaaaug ug 22 20 22 RNAHuman immunodeficiency virus type 1 20 acgaccccuc gucacaauaa ag 22 21 22RNA Human immunodeficiency virus type 1 21 ggaauuggag guuuuaucaa ag 2222 22 RNA Human immunodeficiency virus type 1 22 auauuuuuca guucccuuagau 22 23 22 RNA Human immunodeficiency virus type 1 23 uggaugauuuguauguagga uc 22 24 22 RNA Human immunodeficiency virus type 1 24cuuuggaugg guuaugaacu cc 22 25 22 RNA Human immunodeficiency virus type1 25 cagcuggacu gucaaugaca ua 22 26 22 RNA Human immunodeficiency virustype 1 26 aacuuucuau guagaugggg ca 22 27 22 RNA Human immunodeficiencyvirus type 1 27 aaggccgccu guuggugggc ag 22 28 22 RNA Humanimmunodeficiency virus type 1 28 uaagacagca guacaaaugg ca 22 29 30 DNAArtificial Sequence Description of Artificial Sequence Primer 29cagctgctcg agcagctgaa gcttgcatgc 30 30 34 DNA Artificial SequenceDescription of Artificial Sequence Primer 30 gtaagttatg taacggacgatatcttgtct tctt 34 31 37 DNA Artificial Sequence Description ofArtificial Sequence Primer 31 cgcatagtcg acgggcccgc cactgctaga gattttc37 32 116 DNA Artificial Sequence Description of Artificial SequenceSynthetic oligonucleotide 32 tcgaggtcga ctggtggaca gggaaggatt cgaaccttcgaagtcgatga cgtagagaaa 60 aaatggtggc agtagaagga ttcgaacctt cgaagtcgatgacgtcatcc ccgggc 116 33 110 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 33 tcgaggtcga ctggtggaactggaaggatt cgaaccttcg aagtcgatga cgttcctaaa 60 aaatggtgaa tcatgaaggattcgaacctt cgaagtcgat gacgtaatac 110 34 110 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 34tcgaggtcga ctggtgggcc ccgaaggatt cgaaccttcg aagtcgatga cgtggaaaaa 60aaatggtggg aagagaagga ttcgaacctt cgaagtcgat gacgttggcc 110 35 110 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 35 tcgaggtcga ctggtgacag cagaaggatt cgaaccttcgaagtcgatga cgttcagaaa 60 aaatggtgaa gcaagaagga ttcgaacctt cgaagtcgatgacgtagccc 110 36 110 DNA Artificial Sequence Description of ArtificialSequence Synthetic oligonucleotide 36 tcgaggtcga ctggtgtaag aagaaggattcgaaccttcg aagtcgatga cgttataaaa 60 aaatggtgac cggtgaagga ttcgaaccttcgaagtcgat gacgttatac 110 37 116 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 37 tcgaggcatg cgtcgactggtgggcctaga aggattcgaa ccttcgaagt cgatgacgtg 60 cacaaaaaat ggtgaactacgaaggattcg aaccttcgaa gtcgatgacg tgtacc 116 38 12 DNA Humanimmunodeficiency virus type 1 38 atgggtgcga ga 12 39 12 DNA Humanimmunodeficiency virus type 1 39 gatgaggatt ag 12 40 12 DNA ArtificialSequence Description of Artificial Sequence gagpol-SYNgp-codon optimisedgagpol sequence 40 atgggcgccc gc 12 41 12 DNA Artificial SequenceDescription of Artificial Sequence gagpol-SYNgp-codon optimised gagpolsequence 41 gatgaggatt ag 12 42 12 DNA Human immunodeficiency virus type1 42 atgagagtga ag 12 43 12 DNA Human immunodeficiency virus type 1 43gctttgctat aa 12 44 12 DNA Artificial Sequence Description of ArtificialSequence SYNgp-160nm-codon optimised env sequence 44 atgagggtga ag 12 4512 DNA Artificial Sequence Description of Artificial SequenceSYNgp-160nm-codon optimised env sequence 45 gcgctgctgt aa 12 46 34 RNAHuman immunodeficiency virus type 1 46 ggcucgaacu ugucgugguu aucguggauguguc 34 47 63 RNA Artificial Sequence Description of Artificial SequenceEGS based on Tyrosol t-RNA 47 cgauagcaga cucuaaaucu gccgucaucgacuucgaagg uucgaauccu ucccaggaca 60 cca 63 48 66 RNA Artificial SequenceDescription of Artificial Sequence Consensus EGS sequence 48 nnnnnnnagcagacucuaaa ucugccguca ucgacuucga agguucgaau ccuucnnnnn 60 ncacca 66 4949 RNA Artificial Sequence Description of Artificial Sequence ConsensusEGS sequence 49 nnnnnnnacg ucaucgacuu cgaagguucg aauccuucnn nnnncacca 4950 13 RNA Human immunodeficiency virus type 1 50 gggccuauag cac 13 51 13RNA Human immunodeficiency virus type 1 51 gaacuacuag uac 13 52 13 RNAHuman immunodeficiency virus type 1 52 guaagaaugu aua 13 53 13 RNA Humanimmunodeficiency virus type 1 53 gaccgguucu aua 13 54 13 RNA Humanimmunodeficiency virus type 1 54 gacagcaugu cag 13 55 13 RNA Humanimmunodeficiency virus type 1 55 gaagcaauga gcc 13 56 13 RNA Humanimmunodeficiency virus type 1 56 gggccccuag gaa 13 57 13 RNA Humanimmunodeficiency virus type 1 57 gggaagaucu ggc 13 58 13 RNA Humanimmunodeficiency virus type 1 58 ggaacuguau ccu 13 59 13 RNA Humanimmunodeficiency virus type 1 59 gaaucuauga aua 13 60 13 RNA Humanimmunodeficiency virus type 1 60 ggacagguaa gag 13 61 13 RNA Humanimmunodeficiency virus type 1 61 ggcaguauuc auc 13 62 46 DNA ArtificialSequence Description of Combined DNA/RNA Molecule Anti- HIV EGSconstruct 62 gtgcacguca ucgacuucga agguucgaau ccuucuaggc ccacca 46 63 46DNA Artificial Sequence Description of Combined DNA/RNA Molecule Anti-HIV EGS construct 63 gtacacguca ucgacuucga agguucgaau ccuucguagu ucacca46 64 46 RNA Artificial Sequence Description of Artificial SequenceAnti-HIV EGS construct 64 uauaacguca ucgacuucga agguucgaau ccuucuucuuacacca 46 65 46 RNA Artificial Sequence Description of ArtificialSequence Anti-HIV EGS construct 65 uauaacguca ucgacuucga agguucgaauccuucaccgg ucacca 46 66 46 RNA Artificial Sequence Description ofArtificial Sequence Anti-HIV EGS construct 66 cugaacguca ucgacuucgaagguucgaau ccuucugcug ucacca 46 67 46 RNA Artificial SequenceDescription of Artificial Sequence Anti-HIV EGS construct 67 ggcuacgucaucgacuucga agguucgaau ccuucuugcu ucacca 46 68 46 DNA Artificial SequenceDescription of Combined DNA/RNA Molecule Anti- HIV EGS construct 68ttccacguca ucgacuucga agguucgaau ccuucggggc ccacca 46 69 46 RNAArtificial Sequence Description of Artificial Sequence Anti-HIV EGSconstruct 69 gccaacguca ucgacuucga agguucgaau ccuucucuuc ccacca 46 70 46RNA Artificial Sequence Description of Artificial Sequence Anti-HIV EGSconstruct 70 aggaacguca ucgacuucga agguucgaau ccuuccaguu ccacca 46 71 46RNA Artificial Sequence Description of Artificial Sequence Anti-HIV EGSconstruct 71 uauuacguca ucgacuucga agguucgaau ccuucuagau ucacca 46 72 46RNA Artificial Sequence Description of Artificial Sequence Anti-HIV EGSconstruct 72 cucuacguca ucgacuucga agguucgaau ccuucccugu ccacca 46 73 46RNA Artificial Sequence Description of Artificial Sequence Anti-HIV EGSconstruct 73 gaugacguca ucgacuucga agguucgaau ccuucuacug ccacca 46

What is claimed:
 1. A viral vector system comprising: (i) a firstnucleotide sequence and a second nucleotide sequence, wherein the firstnucleotide sequence encodes an external guide sequence capable ofbinding to and effecting the cleavage by RNase P of the secondnucleotide sequence, or transcription product thereof, wherein thesecond nucleotide sequence encodes a viral polypeptide required for theassembly of viral particles; and (ii) a third nucleotide sequenceencoding a viral polypeptide required for the assembly of viralparticles, which third nucleotide sequence has a different nucleotidesequence than the second nucleotide sequence, such that the thirdnucleotide sequence, or transcription product thereof, is resistant tocleavage directed by the external guide sequence.
 2. The viral vectorsystem according to claim 1, further comprising at least one furthernucleotide sequence encoding a gene product capable of binding to andeffecting the cleavage, directly or indirectly, of the second nucleotidesequence, or transcription product thereof, wherein the gene product isselected from an external guide sequence, a ribozyme and an anti-senseribonucleic acid.
 3. A viral vector production system comprising: (i) aviral genome comprising at least one first nucleotide sequence and asecond nucleotide sequence, wherein the at least one first nucleotidesequence encodes a gene product capable of binding to and effecting thecleavage, directly or indirectly, of the second nucleotide sequence, ortranscription product thereof, wherein the second nucleotide sequenceencodes a viral polypeptide required for the assembly of viralparticles; (ii) a third nucleotide sequence encoding a viral polypeptiderequired for the assembly of the viral genome into viral particles,which third nucleotide sequence has a different nucleotide sequence thanthe second nucleotide sequence such that said third nucleotide sequence,or transcription product thereof, is resistant to cleavage directed bysaid gene product; wherein at least one gene product is an externalguide sequence capable of binding to and effecting the cleavage by RNaseP of the second nucleotide sequence.
 4. The viral vector productionsystem according to claim 3, wherein, in addition to an external guidesequence, at least one gene product is selected from a ribozyme and ananti-sense ribonucleic acid.
 5. The viral vector system according toclaim 1, wherein the viral vector is a retroviral vector.
 6. The viralvector system according to claim 5, wherein the retroviral vector is alentiviral vector.
 7. The viral vector system according to claim 6,wherein the lentiviral vector is an HIV vector.
 8. The viral vectorsystem according to claim 5, wherein the polypeptide required for theassembly of viral particles is selected from gag, pol and env proteins.9. The viral vector system according to claim 8, wherein at least thegag and pol proteins are from a lentivirus.
 10. The viral vector systemaccording to claim 8, wherein the env protein is from a lentivirus. 11.The viral vector system according to claim 9, wherein the lentivirus isHIV.
 12. The viral vector system according to claim 3, wherein the thirdnucleotide sequence is resistant to cleavage directed by the geneproduct as a result of one or more conservative alterations in the thirdnucleotide sequence, which remove cleavage sites recognised by the atleast one gene product and/or binding sites for the at least one geneproduct.
 13. The vital vector system according to claim 1, wherein thethird nucleotide sequence is adapted to be resistant to cleavage byRNase P.
 14. The viral vector system according to claim 1, wherein thethird nucleotide sequence is codon optimised for expression in producercells.
 15. The viral vector system according to claim 14, wherein theproducer cells are mammalian cells.
 16. The viral vector systemaccording to claim 1 comprising a plurality of first nucleotidesequences and third nucleotide sequences as defined in claim
 1. 17. Aviral particle comprising the viral vector genome as defined in claim 3and one or more third nucleotide sequences as defined in claim
 3. 18. Aviral particle produced using the viral vector production systemaccording to claim
 3. 19. A method for producing a viral particle whichmethod comprises introducing into a host cell (i) the viral genome asdefined in claim 3 (ii) one or more third nucleotide sequences asdefined in claim 3 and (iii) nucleotide sequences encoding essentialviral packaging components not encoded by the one or more thirdnucleotide sequences.
 20. A viral particle produced by the method ofclaim
 19. 21. A pharmaceutical composition comprising the viral particleaccording to claim 17, together with a pharmaceutically acceptablecarrier or diluent.
 22. A method of treating a viral infection,comprising administering to a subject infected with a virus an effectiveamount of the viral system according to claim 1.